Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
Home
Discussion Groups
DB Engine
SQL ServerMSDESQL Server CE
Services
Analysis (Data Mining)Analysis (OLAP)DTSIntegration ServicesNotification ServicesReporting Services
Programming
CLRConnectivitySQLXML
Other Technologies
ClusteringEnglish QueryFull-Text SearchReplicationService Broker
General
Data WarehousingPerformanceSecuritySetupSQL Server ToolsOther SQL Server Topics
DirectoryUser Groups
Related Topics
MS AccessOther DB ProductsMS Server Products.NET DevelopmentVB DevelopmentJava DevelopmentMore Topics ...

SQL Server Forum / Other Technologies / Full-Text Search / January 2005

Tip: Looking for answers? Try searching our database.

FreeTextTable Rank

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
msnews.microsoft.com - 27 Jan 2005 06:02 GMT
Hi. I have a question - does anyone know on what criteria is the rank set by
FreeTextTable ? I created some records in the table "titles" containing the
words "drink", "much" and "abstinence" and.... well.... I can find no rhyme
or reason to the way this rank is established. I got the ranks 293, 266, 266
and 154. The funny thing is that a title like "No *abstinence* for me,
please !" has a higher rank than "ABSTINENCE for dummies." with the word in
capitals.

So... how exactly is this rank set ? How reliable is it ?

I'm "afraid" to filter the returns (for instance TOP N) and I am "afraid" to
set conditions like "where rank >...." (of course, I could always show all
the hits and page the datagrid).

Thanks a lot.
Alex.
msnews.microsoft.com - 27 Jan 2005 06:29 GMT
After some searching, I found this (for SQL2005):

Ranking of FREETEXT

Freetext ranking is based on the OKAPI BM25 ranking formula. Each term in
the query is ranked, and the values are summed. Freetext queries will add
words to the query via inflectional generation (stemmed forms of the
original query terms); these words are treated as separate terms with no
special weighting or relationship with the words from which they were
generated. Synonyms generated from the Thesaurus feature are treated as
separate, equally weighted terms.

Rank = ?[Terms in Query] w ( ( ( k1 + 1 ) tf ) / ( K + tf ) ) * ( ( k3 + 1 )
qtf / ( k3 + qtf ) ) )

Where:
w is the Robertson-Sparck Jones weight.

Originally, w is defined as:
w = log10 ( ( ( r + 0.5 ) * ( N - n - R + r + 0.5 ) ) / ( ( R - r + 0.5 ) *
( n - r + 0.5 ) ) )
This was simplified to:
w = log10 ( ( ( r + 0.5 ) * ( N - R + r + 0.5 ) ) / ( ( R - r + 0.5 ) * (
n - r + 0.5 ) ) )

R is the number of documents marked relevant by a user. This is not
implemented in SQL Server 2005 full-text search, and thus is ignored.
r is the number of documents marked relevant by a user containing the term.
This is not implemented.
N is the number of documents with values for the property in the query.
n is the number of documents containing the term.
K is ( k1 * ( ( 1 - b ) + ( b * dl / avdl ) ) )
dl is the document length, in word occurrences.
avdl is the average document length of the property over which the query
spans, in word occurrences.
k1, b, and k3 are the constants 1.2, 0.75, and 8.0, respectively.
tf is the frequency of the term in a specific document.
qtf is the frequency of the term in the query.

So now I know :-)))) Anyway, apparently the ranking results have to be taken
with a *BIG" grain of salt.

Alex.
John Kane - 27 Jan 2005 07:22 GMT
Alex,
Yep, so you found (via Google?) the SQL 2005 FTS paper published in Dec
2004. While it documents SQL 2005, I *believe* that as far as the contains
and freetext ranking formula's that they hold true for SQL Server 2000 as
well. Even so, I'm surprised that for SQL Server 2005 (or for that matter
SQL 2000) that relevance feedback (R & r in the formula) was not implemented
as there are standard methods using T-SQL and feedback tables that can be
use to implement automatic relevance feedback...

Oh, and that *BIG" grain of salt, that you speak of, not necessary... You
now know the formula, and with your table's unique non-noise words, you can
calculate the rank values from your query by yourself!

Regards,
John
Signature

SQL Full Text Search Blog
http://spaces.msn.com/members/jtkane/

> After some searching, I found this (for SQL2005):
>
[quoted text clipped - 39 lines]
>
> Alex.
John Kane - 27 Jan 2005 07:03 GMT
Hi Alex,
Yes, I do, but it is complex and you need to understand a bit about basic
Informational Retrieval theory. The Rank value from FREETEXTTABLE is based
upon what is known as OKAPI or BM25 that was developed by Stephen Robertson
(http://research.microsoft.com/users/robertson/). You can see some of the
formula documented in "SQL Server 2005 Full-Text Search: Internals and
Enhancements" at:
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnsql90/html/sq
l2005ftsearch.asp

Specifically, under "Ranking of FREETEXT" -

"Freetext ranking is based on the OKAPI BM25 ranking formula. Each term in
the query is ranked, and the values are summed. Freetext queries will add
words to the query via inflectional generation (stemmed forms of the
original query terms); these words are treated as separate terms with no
special weighting or relationship with the words from which they were
generated. Synonyms generated from the Thesaurus feature are treated as
separate, equally weighted terms."...

See also http://wickedsmrt.blogspot.com/2003_05_01_wickedsmrt_archive.html
"From MS Newsgroups: However it was my understanding that rank is based on
this formula:
? W(i)=(K1+1) ?idf(i)?(K2+1) ?tf(i.j) /( K1?[(1-b)+b?dl
(j)/avdl])?K3(tf(i.j))

W(i) - rank from each term in the search phrase
idf(i) - iS the inverse document frequency of term i
tf(i,j) - is the term frequency for term i, in row j
K1,K2,K3 - are constants
dl - is row/column length in words
AVdl - is the average row/column length length in words"

Another factor is the number of rows and the number of unique non-noise
words per row as you must have a statistically significant number of rows
(at least 10,000) for the OKAPI BM25 Freetexttable Rank value to be
meaningful. How many rows are in your table "titles"? Also keep in mind that
the Rank values are specific to your freetext query and primarily useful for
ordering of the results. See SQL Server 2000 BOL title "Full-text Search
Recommendations" - "What is RANK and how is it determined when used with
CONTAINSTABLE and FREETEXTTABLE predicates?..." for more info.

Yes, OKAPI BM25 is very reliable, but complex. Note, that for US English,
SQL FTS is case insensitive and "ABSTINENCE" will have the same rank value
as "abstinence" with all other factors being equal. It is complex as you can
see, but what is your true objective? Could you provide the exact Freetext
query with sample data and results as well as the full output of SELECT
@@version ?

Hope that helps!
John
Signature

SQL Full Text Search Blog
http://spaces.msn.com/members/jtkane/

> Hi. I have a question - does anyone know on what criteria is the rank set by
> FreeTextTable ? I created some records in the table "titles" containing the
[quoted text clipped - 12 lines]
> Thanks a lot.
> Alex.
msnews.microsoft.com - 27 Jan 2005 07:24 GMT
Hello, John. Thank you for your reply.

Yes, I understand. Unfortunately, I'm using the pubs db, and I only have
about 50 titles in the 'Titles' table.
Here are the results of my query:
--------------------------------------------------------------------------------------------------------------------------
Rank/Title/Notes/First Name/Last Name

375/We don't drink that much/Essay on abstinence./Napoleon/Borcan

266/Que c'est bon, c'est bon, c'est bon !/Essai sur les joies de boire
beaucoup. Abstinents s'abstenir (eng: abstinence)./Pisica/Rindunel

121/No *abstinence* for me, please !/A smoker's paradise. Smoke-smoke-smoke,
boy, oh, isn't it a joy ?/Alberta/Curisor

121/ABSTINENCE for dummies./An epicurian's guide./Pupu/Balacarescu
--------------------------------------------------------------------------------------------------------------------------

and this is the query, which is correct:

--------------------------------------------------------------------------------------------------------------------------
strSearch = "SELECT " & _
               "SearchTable.[Rank], Titles.title as Title, Titles.notes as
Notes, Authors.au_fname as [First Name], Authors.au_lname as [Last Name]" &
_
               "FROM " & _
               "FREETEXTTABLE(Titles, *, '" & strText & "') as SearchTable
" & _
               "INNER JOIN Titles ON SearchTable.[Key] = Titles.title_id "
& _
               "INNER JOIN TitleAuthor ON Titles.title_id =
TitleAuthor.title_id " & _
               "INNER JOIN Authors ON TitleAuthor.au_id = Authors.au_id " &
_
               "ORDER BY SearchTable.[Rank] DESC"
--------------------------------------------------------------------------------------------------------------------------
The version is:
Microsoft SQL Server  2000 - 8.00.194 (Intel X86)   Aug  6 2000 00:57:48
Copyright (c) 1988-2000 Microsoft Corporation  Developer Edition on Windows
NT 5.1 (Build 2600: Service Pack 2)

Alex.

> Hi Alex,
> Yes, I do, but it is complex and you need to understand a bit about basic
[quoted text clipped - 72 lines]
>> Thanks a lot.
>> Alex.
John Kane - 27 Jan 2005 07:47 GMT
You're welcome, Alex,
Yes, I thought so. While the pubs and northwind database tables are good
examples for experimenting with SQL FTS queries, they are not large enough
to be used effectively with containstable or freetextable and RANK as you
need production level table sizes to get meaningful Ranking results from the
SQL FTS queries such as the one below.

FYI, I'd recommend that you apply the latest service pack to your SQL Server
2000 (8.00.194) Developer's Edition on WinXP SP2 as the build (194) of SQL
Server 2000 that you are using has no service packs applied and you may be
open security bugs with this RTM version.

Signature

SQL Full Text Search Blog
http://spaces.msn.com/members/jtkane/

> Hello, John. Thank you for your reply.
>
[quoted text clipped - 48 lines]
> > formula documented in "SQL Server 2005 Full-Text Search: Internals and
> > Enhancements" at:

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnsql90/html/sq
l2005ftsearch.asp

> > Specifically, under "Ranking of FREETEXT" -
> >
[quoted text clipped - 63 lines]
> >> Thanks a lot.
> >> Alex.
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2009 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.