We are still struggling to get stable operation with full text indexing.
After a variety of problems that I have posted before, I am trying a smaller
table with about 3 million rows, and I'm using incremental builds rather
than change tracking.
At the end of an incremental population last night, the Gatherer logged the
following message in the event log:
Event Type: Information
Event Source: Microsoft Search
Event Category: Gatherer
Event ID: 3047
Date: 7/7/2004
Time: 8:35:30 AM
User: N/A
Computer: IDX
Description:
The end of the incremental crawl for project <SQLServer SQL0000700005> has
been detected. The Gatherer successfully processed 3474841 documents
totaling 0K. It failed to filter 262735 documents. 573712 documents were
modified. 262735 URLs could not be reached or were denied access.
The Gatherer error log contains a large number of entries similar to this:
/7/2004 5:37:28 AM mssql75://sqlserver/4bac3f29/41840c26d8000000 Add Error
fetching URL, (80041201 - The object was not found. )
Between the time of the original full population and this incremental,
several hundred thousands rows were added and removed from the table being
indexed. It is quite possible that the 573712 documents mentioned as being
processed in the above message is the sum of the rows added and removed.
And it is quite possible that the 262735 URLs that could not be accessed
corresponds to the number of rows that were removed from the table between
the full and incremental population.
So the question is whether this is an error or just a note letting me know
that rows have been removed from the table.
Is there any problem using an incremental population on a table that has had
rows removed?
If rows are removed from the table and an incremental population is done,
does it correctly de-index the rows that were removed?
Note: I searched for this error message and found a number of reports
related to SharePoint. We are NOT using SharePoint. The column being
indexed has plain ascii text.

Signature
Phil Sherrod
(phil.sherrod 'at' sandh.com)
http://www.dtreg.com (decision tree modeling)
http://www.nlreg.com (nonlinear regression)
Phil,
Q. So the question is whether this is an error or just a note letting me
know that rows have been removed from the table. Is there any problem using
an incremental population on a table that has had rows removed?
A. No, these are not errors, and are just what you suspect, i.e.,
informational messages related to the deletes of rows that are not in the
SQL table, but are in the FT Catalog during the Incremental Population. This
is also why an Incremental Population, even with no changes can sometimes
take as long as a Full Population.
Q. If rows are removed from the table and an incremental population is done,
does it correctly de-index the rows that were removed?
A. Yes... Specifically, during an Incremental Population, the "row" exists
in the FT Catalog, but not in the SQL table, this is detected by the
Incremental Population process and the "row" (really, list of non-noise,
unique words associated with the SQL table's Primary key) is deleted in the
FT Catalog.
As SQL FTS and Sharepoint share the same basic MSSearch technology, and
while there are some implementation & feature differences, most of the
errors are shared. I too would of hoped that by now there would be more KB
articles on SQL FTS from MS, but alas that is not the case...
Regards,
John
> We are still struggling to get stable operation with full text indexing.
> After a variety of problems that I have posted before, I am trying a smaller
[quoted text clipped - 43 lines]
> related to SharePoint. We are NOT using SharePoint. The column being
> indexed has plain ascii text.
Phil Sherrod - 08 Jul 2004 16:42 GMT
John,
Thank you again for your detained and helpful answer. I have posted a
message in this group recommending you for MVP status. You seem to be the
only person involved in this group who really knows how FT indexing works.
It would be nice if some Microsoft support people would contribute.
> Q. So the question is whether this is an error or just a note letting me
> know that rows have been removed from the table. Is there any problem
[quoted text clipped - 6 lines]
> is also why an Incremental Population, even with no changes can sometimes
> take as long as a Full Population.
I am happy to hear that they are not errors. It is unfortunate that the
gatherer specifically calls them "errors" and even displays an error code.
> Q. If rows are removed from the table and an incremental population is
> done,
[quoted text clipped - 5 lines]
> the
> FT Catalog.
OK, that's the way I hoped it worked.
I realize that Change Tracking with background updates is a better and more
efficient method than Incremental Population. As you know from previous
messages, we have been having a great deal of trouble getting FT indexing to
operate reliably with a large table containing about 10 million rows. In
desperation, I have reduced the table to about 4 million rows and also
changed from Change Tracking to Incremental Populations to see if we can
find a reliable approach. If this works reliably for a few days, I will
switch to Change Tracking/Background Updates and see if that works reliably
with a 4 million item index. If it does, I will then gradually increase the
size of the index a little each day.
We need to find some stable ground that we stand on and drop back to if
problems occur. So far, FT indexing has been terribly unreliable with our
10 million row table: We have not gone for more than 2 days without losing
the entire index with one type of error or another. Interestingly, the
table being indexed is never lost or damaged; it is just the FT indexing
system that gets sick.
I truly hope that Microsoft has done a total redesign and rewrite of the FT
indexing system in Yukon. I had to laugh when I saw an article in The New
York Times today titled "Microsoft Sets Their Sights on Google". If they
are planning to go against Google with their current FT indexing technology,
I don't think they'll get very far.
As always, your help is much appreciated.

Signature
Phil Sherrod
(phil.sherrod 'at' sandh.com)
http://www.dtreg.com (decision tree modeling)
http://www.nlreg.com (nonlinear regression)