Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
Home
Discussion Groups
DB Engine
SQL ServerMSDESQL Server CE
Services
Analysis (Data Mining)Analysis (OLAP)DTSIntegration ServicesNotification ServicesReporting Services
Programming
CLRConnectivitySQLXML
Other Technologies
ClusteringEnglish QueryFull-Text SearchReplicationService Broker
General
Data WarehousingPerformanceSecuritySetupSQL Server ToolsOther SQL Server Topics
DirectoryUser Groups
Related Topics
MS AccessOther DB ProductsMS Server Products.NET DevelopmentVB DevelopmentJava DevelopmentMore Topics ...

SQL Server Forum / Other Technologies / Full-Text Search / May 2007

Tip: Looking for answers? Try searching our database.

Fulltext query with custom rank

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
DC - 14 May 2007 16:41 GMT
Hi,

I brought this up once ago, but I read something which might open a
new possibility. What I am trying to do is this:

select p.Name, p.Score, ft.Rank
from Products p
join (
       select [key], rank from
       containstable(Products, Name, '"Screw*"', 10)
) as ft
on ft.[key]= p.ProductId
order by p.Score desc

Let's say "Products" contains one million products and there are 20000
matching products containing "screw" as a part of their name. What I
want to get are the top 10 products matching the query, but the FT
rank should equal the Score rank. I don't want this:

containstable(Products, Name, '"Screw*"', 50000)

and then join the resulting 20000 rows with products and order the set
by Score, since that will be too expensive.

Best would be, if I could actually set the value that the ranking
algorithm is based on. The indexer would simply use Score as the
predominant factor for the ranking. So

containstable(Products, Name, '"Screw*"', 10)

would return the first 10 matches but sorted by Score.

>From what I know this is not possible with FT in 2000 or 2005, but I
read that it is possible to use CLR integration to customize the
indexing process. However, I could not find anything in the docs about
that. Maybe one can only create indexes and stuff like that, but I am
still hoping that someone has a clue on how to possibly manipulate FT
rank in SQL Server 2005.

Kind regards
DC
ynogin - 15 May 2007 19:41 GMT
DC,
I've created my own CLR function and use it instead of FTS rank. It gives
much better result (by similarity)
I use Levenshtein Edit distance to calculate the score.

Thanks,

Yuri

> Hi,
>
[quoted text clipped - 37 lines]
> Kind regards
> DC
DC - 15 May 2007 21:21 GMT
Hi Yuri,

very interesting, were you able to manipulate the fulltext index
ranking, i.e. would

containstable(Products, Name, '"Screw*"', 10)

return the top 10 results as calculated by your algorithm?

Do you maybe have a link or a topic to look for about this kind of CLR
integration?

Regards
DC

> DC,
> I've created my own CLR function and use it instead of FTS rank. It gives
[quoted text clipped - 48 lines]
>
> - Zitierten Text anzeigen -
ynogin - 16 May 2007 16:14 GMT
DC,
The syntax I use is more like:

Select top(@n_rows) *,my_score_fn(your_column,@your_value)
from your_table
where containstable(your_column,@your_value)
order by my_score_fn(your_column,@your_value) desc

You also can use freetexttable function instead.

Thanks,
Yuri

> Hi Yuri,
>
[quoted text clipped - 63 lines]
> >
> > - Zitierten Text anzeigen -
DC - 18 May 2007 09:46 GMT
Hi Yuri,

if containstable(your_column,@your_value) returns 100.000 results,
then my_score_fn will have to do a lot of sorting.

This is exactly the problem that I am facing, where out of about one
million indexed rows easily 10 to 100 thousand results are being
returned by weak queries.

For your function my_score_fn it would probably not be of much value
to do what I am trying, since my_score_fn takes the search argument as
a parameter. So you cannot precalculate your desired ranking and store
it into an additional column. But my "Score" is totally independant of
the search argument. The RANK that sql server calculates is useless
for me. Since

containstable(Products, Name, '"Screw*"', 10)

gives the top 10 results by RANK, I only see the chance to manipulate
RANK generation somehow. But I apprehend that this is not possible at
all.

Regards
DC

> DC,
> The syntax I use is more like:
[quoted text clipped - 78 lines]
>
> - Zitierten Text anzeigen -
ynogin - 18 May 2007 14:16 GMT
DC,
You actually can put a trash holder like

Select ....

Where score_fn(search_column,@value) > 30 --in percent
order by ...

This way you will cut out most of the low "similarity" matches and your
result set will be a lot smaller.

Thanks,

Yuri

> Hi Yuri,
>
[quoted text clipped - 103 lines]
> >
> > - Zitierten Text anzeigen -
DC - 18 May 2007 14:44 GMT
Hi Yuri,

if I do this:

select p.name, p.score
from containstable(products, name, '"al*"') ft
join products p on p.id = ft.[Key] and p.score > 100

it will still take the fulltext service a long time to dig up ALL
products matching "al*" and it will filter the ones with a score of
100 or less afterwards.

This, on the other hand, will be a lot faster:

select p.name, p.score
from containstable(products, name, '"al*"', 200) ft
join products p on p.id = ft.[Key] and p.score > 100

because the ft index only returns the first 200 rows matchin "al*" and
then the p.score > 100 clause will be applied... so there is probably
no match left.

My whole point is, that I want this

containstable(products, name, '"al*"', 200)

to return the first 200 products matching "al*" but sorted by "score
desc". Normally

containstable(products, name, '"al*"', 200)

will return the first 200 matches sorted by RANK, and that RANK is
being calculated by ft index engine. What I want is to produce a
custom RANK.

Regards
DC

> DC,
> You actually can put a trash holder like
[quoted text clipped - 120 lines]
>
> - Zitierten Text anzeigen -
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2009 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.