Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
Home
Discussion Groups
DB Engine
SQL ServerMSDESQL Server CE
Services
Analysis (Data Mining)Analysis (OLAP)DTSIntegration ServicesNotification ServicesReporting Services
Programming
CLRConnectivitySQLXML
Other Technologies
ClusteringEnglish QueryFull-Text SearchReplicationService Broker
General
Data WarehousingPerformanceSecuritySetupSQL Server ToolsOther SQL Server Topics
DirectoryUser Groups
Related Topics
MS AccessOther DB ProductsMS Server Products.NET DevelopmentVB DevelopmentJava DevelopmentMore Topics ...

SQL Server Forum / Other Technologies / Full-Text Search / December 2006

Tip: Looking for answers? Try searching our database.

FTI, Searching and other Filters

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
Kyle Jedrusiak - 13 Dec 2006 19:09 GMT
We have this table...

CREATE TABLE [dbo].[Document](
   [DocumentID] [int] IDENTITY(1,1) NOT NULL,
   [HumanResourceID] [int] NOT NULL,
   [Name] [nvarchar](256) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
   [Description] [nvarchar](256) COLLATE SQL_Latin1_General_CP1_CI_AS NOT
NULL,
   [ContentType] [nchar](4) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
   [Content] [image] NOT NULL,
   [DateEntered] [datetime] NOT NULL,
   [DateModified] [datetime] NULL,
   [Version] [timestamp] NOT NULL,
   [EmployeeID] [int] NULL,
CONSTRAINT [Resume_PK] PRIMARY KEY CLUSTERED
(
   [ResumeID] ASC
)WITH (PAD_INDEX = OFF, IGNORE_DUP_KEY = OFF, FILLFACTOR = 90) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]

Content contains the bits that make up either Word or RTF documents.

We have FTI defined on Content / ContentType / DocumentID.  Generally FT
searches are working.

The table contains over 130k documents.

Through our application we are limiting their searches to the top 1500 rank
of any FT search.  (So as not to over-burden our server.)

This works when the want to search the table for documents within the entire
company.

But what they would really like is the top 1500 rank for documents within
their office.

Is there a way to partition the the table by an OfficeID with some what to
pre-filter so the FT search is only looking at Documents from one or more
OfficeIDs?

TIA - Kyle!
Kyle Jedrusiak - 13 Dec 2006 19:38 GMT
I think I found my answer...although a lot of work.

Remove the FTI from the table.

Add an OfficeID column to the table and populate it.

Partition the table by the OfficeID column.

Create an Indexed View for each OfficeID
(Open a new office, the add a new OfficeID and a new Indexed View for that
OfficeID.)
(Close an existing Office, migrate the documents to a different office, drop
the FTI for that View and drop that View)

Add a FTI to each of the Indexed Views.

Mod the application so it knows what how to FT search one or more Indexed
Views and combine the results from multiple views if needed.

> We have this table...
>
[quoted text clipped - 38 lines]
>
> TIA - Kyle!
Simon Sabin - 15 Dec 2006 00:32 GMT
Hello Kyle,

The other option is to add the office ID to the content (If the content was
editable i.e. text/html)

Then use a query like containstable(document, content,'OFFICE2345 AND "SQL
SERVER DBA"')

If the content is editabel this is by far the more manageable, and scalable.

We did the index view thing and it is just not a neat solution. The token
thing is much easier.

Simon Sabin
SQL Server MVP
http://sqlblogcasts.com/blogs/simons

> I think I found my answer...although a lot of work.
>
[quoted text clipped - 59 lines]
>>
>> TIA - Kyle!
ddaiker - 18 Dec 2006 20:15 GMT
Um, maybe I'm missing something here, but since you are using the rank
I'm assuming you are using CONTAINSTABLE or FREETEXTTABLE and joining
back to Document.  I'm also assuming you are using a "TOP 1500" and an
"ORDER BY Rank" in your query.  Could you then just add "AND OfficeID =
'USEROFFICEID'" to your WHERE clause after adding the OfficeID field to
the table?

> Hello Kyle,
>
[quoted text clipped - 75 lines]
>
> >> TIA - Kyle!- Hide quoted text -- Show quoted text -
Kyle Jedrusiak - 20 Dec 2006 17:26 GMT
Looks like I'm stuck.

In order to create a Full Text Index on my view, the view has to have a
unique index.

SQL 2005 doesn't allow you to create a index on a view if the view contains
an text, ntext, image or xml columns.

And mine does because we're storint the resume in an image column and the
resume is what we're after.

We can't add any special tokens, the content is an image field.

Any other ideas?

> Hello Kyle,
>
[quoted text clipped - 77 lines]
>>>
>>> TIA - Kyle!
Lakusha - 21 Dec 2006 03:06 GMT
Kyle,

a) you can create an index on a view if you use VARCHAR(MAX) or
VARBINARY(MAX) instead of TEXT or IMAGE (which are deprecated in sql 2005).

b) AFAIK, those "tokens" can be other columns from the same view/table that
are also indexed in that FT catalog. You just specify CONTAINS(*,... Instead
of CONTAINS(MyClobColumn,...

it should work.

> Looks like I'm stuck.
>
[quoted text clipped - 92 lines]
>>>>
>>>> TIA - Kyle!
Kyle Jedrusiak - 26 Dec 2006 14:23 GMT
I will have to try changing the type over to VARBINARY(MAX).  That may work.

It mght also be easier to FTI a second column since that will require less
program changes.

Thanks

> Kyle,
>
[quoted text clipped - 104 lines]
>>>>>
>>>>> TIA - Kyle!
Kyle Jedrusiak - 26 Dec 2006 18:19 GMT
I added Office nvchar(8) to my table and populated int

select count(DocumentID) from Document where contains(*, 'Office01 and
cobol')

> Kyle,
>
[quoted text clipped - 104 lines]
>>>>>
>>>>> TIA - Kyle!
Kyle Jedrusiak - 26 Dec 2006 18:24 GMT
I added a Office nvchar(8) column and populated it.

No matter what I specify for a search condition, it doesn't return what I'm
after.

For a test I tried

select count(DocumentID) from Document where contains(*, 'Office02 and
cobol')

'Office02' is in the new column, 'cobol' is in the image column.

If I search for them seperately there is overlap so the data is correct.

Ideas?

> Kyle,
>
[quoted text clipped - 104 lines]
>>>>>
>>>>> TIA - Kyle!
Kyle Jedrusiak - 26 Dec 2006 18:36 GMT
To be clear the 'Office' column will only ever hold an office identifier
liket 'Office01' or 'Office13'.

The Content column has the interedting data.

We want to use containstable to give us the top n by rank DocumentIDs for a
particular office.

>I added a Office nvchar(8) column and populated it.
>
[quoted text clipped - 120 lines]
>>>>>>
>>>>>> TIA - Kyle!
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2009 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.