Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
Home
Discussion Groups
DB Engine
SQL ServerMSDESQL Server CE
Services
Analysis (Data Mining)Analysis (OLAP)DTSIntegration ServicesNotification ServicesReporting Services
Programming
CLRConnectivitySQLXML
Other Technologies
ClusteringEnglish QueryFull-Text SearchReplicationService Broker
General
Data WarehousingPerformanceSecuritySetupSQL Server ToolsOther SQL Server Topics
DirectoryUser Groups
Related Topics
MS AccessOther DB ProductsMS Server Products.NET DevelopmentVB DevelopmentJava DevelopmentMore Topics ...

SQL Server Forum / Other Technologies / Full-Text Search / April 2006

Tip: Looking for answers? Try searching our database.

Full-text search for "make up" fails.

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
Nick Gilbert - 24 Apr 2006 17:03 GMT
Hi,

I have a table which contains the following data in one row:

ID      Category                Keywords
16558    Make-Up Assistants    makeup make up assistants
35976    Make-Up Artists        makeup make up artists
35977    Make-Up Supplies    makeup make up supplies

However it seems impossible for users to search for these three
categories if they type "make up" despite the fact that the full text
index is on both columns (Category and Keywords).

My query is:

SELECT * FROM tblCategory
WHERE
CONTAINS(tblCategory.*, "(FORMSOF(INFLECTIONAL, "make"  ) OR "make*")
                AND (FORMSOF(INFLECTIONAL, "up"  ) OR "up*")
                     ")

Why is this not working? Is it to do with noise words?
How can I fix this problem?

Many thanks,

Nick...
Hilary Cotter - 24 Apr 2006 18:31 GMT
what happens when you search on make up? i.e.

select * from tablename where contains(*,'"Make Up"')

Signature

Hilary Cotter
Director of Text Mining and Database Strategy
RelevantNOISE.Com - Dedicated to mining blogs for business intelligence.

This posting is my own and doesn't necessarily represent RelevantNoise's
positions, strategies or opinions.

Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html

Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com

> Hi,
>
[quoted text clipped - 23 lines]
>
> Nick...
Nick Gilbert - 25 Apr 2006 09:38 GMT
> what happens when you search on make up? i.e.
>
> select * from tablename where contains(*,'"Make Up"')

I get no results. I can only think that "up" is a noise word and thus
hasn't been indexed. But I'm not really sure how I can fix it other than
to rewrite my query so it reads:

CONTAINS(blah)
OR (keywords like '%make up%')

..and effectively do both a full text search AND a normal LIKE search.
But that half undoes the whole point of using FTS in the first place.

I can't believe how little improvement they've made to FTS in SQL Server
in 2005. It still seems to have all the same problems it did in 2000.

Nick...
Hilary Cotter - 25 Apr 2006 14:57 GMT
Both make and up are noise words for US and UK English. Remove these from
the noise word lists, stop MSFTESQL, and rebuild the catalogs. It works for
me:)

Signature

Hilary Cotter
Director of Text Mining and Database Strategy
RelevantNOISE.Com - Dedicated to mining blogs for business intelligence.

This posting is my own and doesn't necessarily represent RelevantNoise's
positions, strategies or opinions.

Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html

Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com

>> what happens when you search on make up? i.e.
>>
[quoted text clipped - 14 lines]
>
> Nick...
Nick Gilbert - 25 Apr 2006 16:19 GMT
> Both make and up are noise words for US and UK English. Remove these from
> the noise word lists, stop MSFTESQL, and rebuild the catalogs. It works for
> me:)

Thanks Hilary. Now fixed. Why is it that on my English boxes, it always
seems to use the US English noise words file.

It's not a very good fix though - MS should have had an option to
disable noise words on a per-column basis while indexing. It seems
pretty pathetic that with 5 years of development, all they've come up
with on this problem is a facility to replace known noise words with "*"
  and called it "transform noise words". You'd think they could have
come up with something a little better. Oh well - maybe in 2010 they
might have actually made an attempt to fix this problem.

I also disagree that there is even such a thing as "noise word".  It all
depends on context. If you're indexing company names then many of the
words in company names are in the noise words file.

Nick...
Hilary Cotter - 25 Apr 2006 16:26 GMT
There is some debate within Microsoft on how to handle noise words. With
disk space being so cheap many people are considering whether to index them
at all. For example, MSN Search does not index noise words, Google still
does.

Microsoft chose to ship SQL Server 2005 with noise words, I don't know
exactly why, but I assume it was for backwards compatibility issues, and as
many people desire them.

I also agree with you about frequent or noisy words being added to the noise
word list - however most people end up removing words as opposed to adding
them.

Signature

Hilary Cotter
Director of Text Mining and Database Strategy
RelevantNOISE.Com - Dedicated to mining blogs for business intelligence.

This posting is my own and doesn't necessarily represent RelevantNoise's
positions, strategies or opinions.

Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html

Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com

>> Both make and up are noise words for US and UK English. Remove these from
>> the noise word lists, stop MSFTESQL, and rebuild the catalogs. It works
[quoted text clipped - 16 lines]
>
> Nick...
Nick Gilbert - 26 Apr 2006 10:55 GMT
> There is some debate within Microsoft on how to handle noise words. With
> disk space being so cheap many people are considering whether to index them
[quoted text clipped - 8 lines]
> word list - however most people end up removing words as opposed to adding
> them.

Well the easiest and neatest solution, would be if you could check a box
when you create the column index to tell the FTS crawler whether to
index the noise words or not. A simple "use noise words" checkbox that
was column or table specific would solve most of the problems that are
posted to this list. Programmatically speaking, this should be very easy
for MS to implement as it's just a flag which tells the crawler whether
or not it should use the noise words file for this column or not.

Nick...
Hilary Cotter - 26 Apr 2006 13:55 GMT
Yes, but the problem is that noise word lists are server specific. You can
not have them column specific unless you are indexing different languages
for each column.

Signature

Hilary Cotter
Director of Text Mining and Database Strategy
RelevantNOISE.Com - Dedicated to mining blogs for business intelligence.

This posting is my own and doesn't necessarily represent RelevantNoise's
positions, strategies or opinions.

Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html

Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com

>> There is some debate within Microsoft on how to handle noise words. With
>> disk space being so cheap many people are considering whether to index
[quoted text clipped - 18 lines]
>
> Nick...
Nick Gilbert - 27 Apr 2006 10:47 GMT
> Yes, but the problem is that noise word lists are server specific. You can
> not have them column specific unless you are indexing different languages
> for each column.

I realise that and that's what I'm complaining about :) It shouldn't be
like that. I don't mind a single noise words file - but you should be
able to disable it for certain columns or tables. Perhaps you should
suggest this to MS - you probably have more weight with them than I do
as you're an MVP. :)

Thanks very much for your time - as always you've been very helpful!
Don't you have a day job? :)

Nick...
Hilary Cotter - 27 Apr 2006 11:45 GMT
I have no more weight than you with them. They are very busy people and in
general not responsive. I understand their work loads and don't blame them.
I do have a day job, and a night job;)

Signature

Hilary Cotter
Director of Text Mining and Database Strategy
RelevantNOISE.Com - Dedicated to mining blogs for business intelligence.

This posting is my own and doesn't necessarily represent RelevantNoise's
positions, strategies or opinions.

Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html

Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com

>> Yes, but the problem is that noise word lists are server specific. You
>> can not have them column specific unless you are indexing different
[quoted text clipped - 10 lines]
>
> Nick...
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2009 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.