Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
Home
Discussion Groups
DB Engine
SQL ServerMSDESQL Server CE
Services
Analysis (Data Mining)Analysis (OLAP)DTSIntegration ServicesNotification ServicesReporting Services
Programming
CLRConnectivitySQLXML
Other Technologies
ClusteringEnglish QueryFull-Text SearchReplicationService Broker
General
Data WarehousingPerformanceSecuritySetupSQL Server ToolsOther SQL Server Topics
DirectoryUser Groups
Related Topics
MS AccessOther DB ProductsMS Server Products.NET DevelopmentVB DevelopmentJava DevelopmentMore Topics ...

SQL Server Forum / Other Technologies / Full-Text Search / April 2006

Tip: Looking for answers? Try searching our database.

relevance sorting with multiple search terms?

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
msft-sql - 19 Apr 2006 22:40 GMT
Hi: I'm trying to get a handle on the best way to approach this issue.

I have a product database of candy with perhaps 5000 products, and I'm
indexing the product name and description fields (both varchars).

People will search for "easter candy" for example.

Splitting the string and using a contains query on both terms will usually
produce too broad a result, returning every product with "candy" in the
name. A freetext query is also too broad.

Searching with an "AND" is too limiting, since I want to return "chocolate
easter egg" even if "candy" is nowhere in the name or description.

A proximity search can be too limiting as well, because the words could be
completely separate in the description, e.g. "These chocolate easter eggs
are the perfect type of candy for ..."

I've tried a weighted search, like:
select * from products

where contains (name, 'isabout(easter Weight(1.0), candy Weight(0.0))')

...but that produces the same results even if I reverse the weighting for
the two terms. Maybe I'm not writing the query correctly?

I'm wondering if other people have dealt with this before? I could add words
like "candy", "bar", "bulk", "package", etc. to the noise list, but I don't
want to exclude them all the time; e.g., if someone searches for "bulk
chocolate" I don't want to drop the word "bulk" and then return every
instance of "chocolate". Similarly I don't want to return hits on "bulk
lemon drops" when I'm searching for "bulk chocolate"

It seems like weighting is the way to go, and I can even maintain an array
of low-weight words and dynamically assign the weight to them when building
the query, but it doesn't seem to work properly.

Any suggestions would be appreciated!
Hilary Cotter - 20 Apr 2006 03:17 GMT
If you know in advance all the possibilities of the search terms, I would
use the expansion type of the thesaurus option.

Signature

Hilary Cotter
Director of Text Mining and Database Strategy
RelevantNOISE.Com - Dedicated to mining blogs for business intelligence.

This posting is my own and doesn't necessarily represent RelevantNoise's
positions, strategies or opinions.

Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html

Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com

> Hi: I'm trying to get a handle on the best way to approach this issue.
>
[quoted text clipped - 34 lines]
>
> Any suggestions would be appreciated!
Simon Sabin - 21 Apr 2006 11:43 GMT
We've done what we call phrase matching, So we take a search i.e. Project
Manager. and look at our table of phrases. if it exists we do a search on
"Project manager" or Project or manager. Project manager jobs will get a
higher ranking than those with something like you will work on a project and
report to a manager.

The challenge is building your phrase list

Signature

Simon Sabin
SQL Server MVP
http://sqljunkies.com/weblog/simons

> If you know in advance all the possibilities of the search terms, I would
> use the expansion type of the thesaurus option.
[quoted text clipped - 38 lines]
>>
>> Any suggestions would be appreciated!
msft-sql - 21 Apr 2006 20:30 GMT
> We've done what we call phrase matching, So we take a search i.e. Project
> Manager. and look at our table of phrases. if it exists we do a search on
[quoted text clipped - 3 lines]
>
> The challenge is building your phrase list

I'm sort of doing that now but without utilizing the full-text query:

A) I search for the exact string first
B) if there are no hits, then I split it and search again by building a
query with "AND" between all the words
C) if that fails I run the query again searching with "OR"

Before running the query, I'm also rewriting the words with some rules to
remove plurals, break up hyphenated words, etc.

If there's a hit on A, I return just that match, with the option to expand
the search, at which point I rerun the query using B, and then C.

Problem is, if A and B return no results, then C returns too many. And, of
course, there is no weighting of the search results.

Question for you Simon: Wouldn't "NEAR" achieve the same effect as a custom
lookup table?

Unfortunately the method you're suggesting in not practical for me. Unless,
I start to save all the search terms in a lookup table, and check that table
first...but that seems unwieldly, and also would seem to be replicating what
the full-text index is doing in the first place?

Or is what I'm after is beyond the capability of full-text searching?

Thanks for you comments!

>> If you know in advance all the possibilities of the search terms, I would
>> use the expansion type of the thesaurus option.
[quoted text clipped - 38 lines]
>>>
>>> Any suggestions would be appreciated!
msft-sql - 21 Apr 2006 20:19 GMT
> If you know in advance all the possibilities of the search terms, I would
> use the expansion type of the thesaurus option.

Hi Hilary: I'm not sure I understand what you mean by "all the
possibilities"...This is a user-controlled search tool...they're just typing
in stuff they're looking for.

I can compile a sort-of "actionable words" list for words that should be
"demoted" in relevance, like:

bulk
candy
bar

...but I don't know if that's what you mean?

Or do you mean creating my own custom dictionary of search "keywords" with
weightings?

Or are you talking about something other than weighting altogether?

>> Hi: I'm trying to get a handle on the best way to approach this issue.
>>
[quoted text clipped - 35 lines]
>>
>> Any suggestions would be appreciated!
Hilary Cotter - 24 Apr 2006 15:53 GMT
I was talking a building a thesaurus. For people might search on chocolate
bars, chocco bars, bars, candy bars, and they mean the same thing. So
someone searching on chocolate bars, would be best served by expanding their
search to all synonyms of the phrase chocolate bars.

Signature

Hilary Cotter
Director of Text Mining and Database Strategy
RelevantNOISE.Com - Dedicated to mining blogs for business intelligence.

This posting is my own and doesn't necessarily represent RelevantNoise's
positions, strategies or opinions.

Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html

Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com

>> If you know in advance all the possibilities of the search terms, I would
>> use the expansion type of the thesaurus option.
[quoted text clipped - 56 lines]
>>>
>>> Any suggestions would be appreciated!
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2009 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.