Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
Home
Discussion Groups
DB Engine
SQL ServerMSDESQL Server CE
Services
Analysis (Data Mining)Analysis (OLAP)DTSIntegration ServicesNotification ServicesReporting Services
Programming
CLRConnectivitySQLXML
Other Technologies
ClusteringEnglish QueryFull-Text SearchReplicationService Broker
General
Data WarehousingPerformanceSecuritySetupSQL Server ToolsOther SQL Server Topics
DirectoryUser Groups
Related Topics
MS AccessOther DB ProductsMS Server Products.NET DevelopmentVB DevelopmentJava DevelopmentMore Topics ...

SQL Server Forum / Other Technologies / Full-Text Search / June 2007

Tip: Looking for answers? Try searching our database.

search problem after upgrade from SQL2000 SP3 / w2k to SQL2000 SP4 / w2k3

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
tbh - 13 Jun 2007 09:15 GMT
hi,

we have recently upgraded a production database using the following trick
(to minimize downtime):
- we replicated the database from db004 (old)[*] to db04 (new)[**] (merge
replication, updates every minute i think)
- [incidentally, i don't think it matters] we took this opportunity to
improve our (German) noise words from the supplied default list to a much
better list [***]
- once that was settled we created and populated the necessary fulltext
indexes on the new server [we did these steps via EntMgr GUI -- not manually
with SPs]
- we found a good time to swing all our production processes and clients
from the old machine to the new

there were a couple of glitches, but this all went pretty well.

we were just about to clean up (breaking the replication, disposing of the
old DB) when (alas, far too late -- we should have tested better: famous
last words) we discovered a *bad* problem with searching in the new system:

CONTAINS searches are finding *way too many* hits. for example, searching
for "ehrensenf" (strange term) almost instantly finds about 5 hits in the
old DB. it times out or finds thousands of bogus hits in the new DB.

when i look at these bogus hits and try to fathom why the CONTAINS clause
succeded, the only clues i have found so far are that parts of the search
term are found (for example, "ehre", which is a word in its own right,
inside another word). but i have not asked for any special INFLECTIONAL
treatment or the like.

i'm looking for any clues about this (we have even opened a call to
Microsoft support, but they are evidently swamped at the moment, so I'm
trying this channel too; i plan to post a solution when we find one in case
anyone else runs into a similar problem.)

obviously several components have changed, which makes finding the culprit
non-trivial.

i'd be very grateful for any hints or ideas.

Tim Hanson

[*] old: SQL 2000 SP3 running on a Windows 2000 Server

[**] new: SQL 2000 SP4 running on a Windows 2003 Server

[***] we consulted http://www.ranks.nl/stopwords/german.html and
http://wortschatz.uni-leipzig.de/Papers/top100de.txt
Hilary Cotter - 13 Jun 2007 13:48 GMT
ehrensenf is stored in the index as ehrensenf , ehre, 1, and senf. Perhaps
in the old version you were using the English or the neutral word breaker.

Signature

Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html

Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com

> hi,
>
[quoted text clipped - 46 lines]
> [***] we consulted http://www.ranks.nl/stopwords/german.html and
> http://wortschatz.uni-leipzig.de/Papers/top100de.txt
tbh - 13 Jun 2007 14:01 GMT
hi, Hilary, thanks!

that sounds very plausible. i forgot to mention that our original servers
were probably English and our new server is German. we ran into (and solved)
one unexpected apparently language-dependent varchar/datetime conversion
issue already.

how can i check how ehrensenf is stored in the index? why would it be stored
as ehre, 1, and senf??

cheers,

Tim

> ehrensenf is stored in the index as ehrensenf , ehre, 1, and senf. Perhaps
> in the old version you were using the English or the neutral word breaker.
tbh - 13 Jun 2007 14:35 GMT
p.s. a colleague found this reference, which seems to deal with what you're
describing.
   http://support.microsoft.com/?scid=kb%3Ben-us%3B274275&x=15&y=7

i'd still be very interested to find out how the stemming works, what words
are in my index, etc.

cheers,

Tim

> hi, Hilary, thanks!
>
[quoted text clipped - 9 lines]
>
> Tim
tbh - 14 Jun 2007 13:26 GMT
Update:

the tuning adjustment described in the document (basically, add "1" to your
noise word list) seems to solve our problem.

thanks again for the hint, Hilary!

i am curious what other magic values might affect how indexing works and
will try to learn more as time allows.

Tim

p.s. of course rebuilding the catalog takes about 6 hours in our case and
puts so much load on the DB, that you want to do it at night or offload the
DB while this is happening. so it would have been great to have had this
right on the first pass. live and learn. :)

> p.s. a colleague found this reference, which seems to deal with what
> you're describing.
[quoted text clipped - 6 lines]
>
> Tim
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2009 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.