Thanks for that info - it's essential, a must-know.
Do you by any chance have a list of characters ignored by FTS that aren't
included in noise files (e.g. punctuation marks)?
ML
---
http://milambda.blogspot.com/
Basically all alpha-numeric letters are indexed. Hyphens and capitalization
are respect in some languages. In some languages the indexing process knows
a character occurs after a single letter (i.e. C#), but doesn't index what
the character is, i.e. a search on C# will match with C$.
Currency symbols change how a number is stored in the index as well as
apparent date strings.
Abbreviations are handled differently, for example f.b.i is indexed as f, b,
and i, whereas F.B.I is indexed as FBI, and F.B.I.
IMHO I did an ok job in this article discussing language options in SQL FTS.
http://www.simple-talk.com/sql/learn-sql-server/sql-server-full-text-search-lang
uage-features/
If you are really interested in the internals of how this works with most
search engines you might want to look at the code in Lucene or Foundations
of Statistical Natural Language Processing. There is another book which is
really good on this and presents algorithms but I can't recall the name of
it right now.

Signature
Hilary Cotter
Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html
Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com
> Thanks for that info - it's essential, a must-know.
>
[quoted text clipped - 5 lines]
> ---
> http://milambda.blogspot.com/
ML - 14 Feb 2007 09:59 GMT
Thank you again! That article is now a permanent reference.
ML
---
http://milambda.blogspot.com/