I am pretty new to FTS but found I am stuck with a problem that
doesn't seem common. In all my searching, I have only come across one
post that had the same issue and that was from 2002.
http://groups.google.com.au/group/microsoft.public.sqlserver.fulltext/browse_thr
ead/thread/64e0e6b739e7835f/4164233bb5cf813c?hl=en&lnk=gst&q=rebuild+catalog+dif
ferent+unique+key+count#4164233bb5cf813c
The only suggestion from that post (answered by Hilary Cotter) was the
problem could be related to using change tracking with background
indexing or very high cpu activity on my server. Neither are the
case.
To relate my problem:
I am trying to use FTS to perform content searching on only the pdf
files stored in my database.
While trying to discover why I was getting inaccuracies, I found that
if I rebuild the catalog then do a full population, the unique key
count is different each time.
I tested it 12 times in a row, & the unique key counts results:
Min 10,530
Max 42,731
Average 24,888
Median 21,455
I am running SQL Server 2000 and Adobe iFilter 6 on XP SP2. This is a
development machine, so there is no external interference to alter my
testing. I was also able to duplicate the issue on a Windows 2000
server running SQL Server 2000 and iFilter 6.
I am indexing only one table with a single catalog. I used the wizard
to create this & checked on Hilary's site (http://
www.indexserverfaq.com/SQLFTIWizard.htm) to ensure I was doing it
correctly.
The table holding the files contains a total of 1261 files of which
747 are pdfs. The rest are a variety
of .wpd, .doc, .zip, .tif, .exe, .jpeg, & .dwg files.
The table has the following layout.
3 gVfsId uniqueidentifier 16 0
0 sExt varchar 10 1
0 imgTarget image 16 1
0 sCrc varchar 100 1
0 iFilesize int 4 1
0 bEncrypted bit 1 1
0 bRebuild bit 1 1
0 dteCatIncrement timestamp 8 1
If someone could at least point me in a direction to rectify this
issue, I will be eternally grateful.
Cheers,
Rick.
Hilary Cotter - 07 Feb 2008 16:48 GMT
There was a problem in one of the earlier RTM and service packs of SQL
Server 2000 which had a behavior like this. It was solved in one of
the later SP's - SP 4 IIRC.
Are you doing a full population each time?
> I am pretty new to FTS but found I am stuck with a problem that
> doesn't seem common. In all my searching, I have only come across one
[quoted text clipped - 52 lines]
> Cheers,
> Rick.
PhantomRick - 08 Feb 2008 01:53 GMT
Although I had applied SP4 originally, I re-applied it this morning.
I have then re-tested the catalog, each time rebuilding the catalog
then running a full population through the enterprise manager. The
results were:
Item count / unique word count
1260 46997
1260 36005
1260 38896
1261 32451
1260 31806
1261 33381
Oh, it's now 1262 files - I added another this morning from other
testing, but before running this test.
I aslo noticed while checking through the pdf files, I seem to get a
lot of false positives using CONTAINS. I was presuming this is a
related matter.
Do you have any suggestions as to what else I could try?
> There was a problem in one of the earlier RTM and service packs of SQL
> Server 2000 which had a behavior like this. It was solved in one of
[quoted text clipped - 60 lines]
>
> - Show quoted text -
Hilary Cotter - 08 Feb 2008 14:17 GMT
This seems abnormal. The way it works is that there are temporay
memory resident indexes which are merged into shadow indexes and then
into a single master catalog.
You might be getting this discrepancy from there.
Also are you getting these same false positives when you are using a
contains/containstable query.
> Although I had applied SP4 originally, I re-applied it this morning.
>
[quoted text clipped - 85 lines]
>
> - Show quoted text -