Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
Home
Discussion Groups
DB Engine
SQL ServerMSDESQL Server CE
Services
Analysis (Data Mining)Analysis (OLAP)DTSIntegration ServicesNotification ServicesReporting Services
Programming
CLRConnectivitySQLXML
Other Technologies
ClusteringEnglish QueryFull-Text SearchReplicationService Broker
General
Data WarehousingPerformanceSecuritySetupSQL Server ToolsOther SQL Server Topics
DirectoryUser Groups
Related Topics
MS AccessOther DB ProductsMS Server Products.NET DevelopmentVB DevelopmentJava DevelopmentMore Topics ...

SQL Server Forum / Other Technologies / Full-Text Search / October 2004

Tip: Looking for answers? Try searching our database.

containstable, top_n_rank, and additional where clause combination causes unexpected result

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
Dot net work - 28 Oct 2004 09:48 GMT
Hello,

Something strange happens if I try and do the following:

If I use the containstable function, with top_n_rank, along with an
extra "and where" clause, then the top_n_rank does not seem to return
the correct number of rows.

Here is an example to explain my point:

select *, tempidentity = IDENTITY (INT) into #tempt
from Store_BasicSearchableShelves,
containstable(Store_BasicSearchableShelves, ShelfName, @sSearchText,
@nextitemsrecpointer) tblSearchResults
where [key] = Store_BasicSearchableShelves.ShelfId
and ParentAisleId = @PID
order by rank desc

If @nextitemsrecpointer is equal to 20, then the stored proc only
returns 14 rows.  If I remove the additional "and ParentAisleId =
@PID" where clause, then the stored proc returns 20 rows, which is
correct.  If I begin to edit the value of @nextitemsrecpointer for
experimentational purposes to a higher value such as 30 or 40, then
the stored proc returns more rows -  18 and 25 respectively.

The additional "and ParentAisleId = @PID" seems to be affecting the
way that top_n_rank is behaving.

Can anyone please provide me with a work around for this?

Thank you,
Regards, dnw.
Hilary Cotter - 28 Oct 2004 12:18 GMT
Your problem is that when you limit your result set that is returned  from
MSSearch further rows are removed by the  "and ParentAisleId = @PID" clause.

The approaches to this problem are 1) knowing in advance the number of rows
which are returned by MSSearch for this query and entering this value for
@nextitemrecpointer, 2) partitioning your table into multiple partitioned
tables one for each PartentAisleID value so your query would end up looking
something like this:

if @ParentAisleID=1
begin
select *, tempidentity = IDENTITY (INT) into #tempt
from Store_BasicSearchableShelves_1,
containstable(Store_BasicSearchableShelves, ShelfName, @sSearchText,
@nextitemsrecpointer) tblSearchResults
where [key] = Store_BasicSearchableShelves.ShelfId
order by rank desc
end

if @ParentAisleID=2
begin
select *, tempidentity = IDENTITY (INT) into #tempt
from Store_BasicSearchableShelves_2,
containstable(Store_BasicSearchableShelves, ShelfName, @sSearchText,
@nextitemsrecpointer) tblSearchResults
where [key] = Store_BasicSearchableShelves.ShelfId
order by rank desc
end

3) picking a larger number which will guarantee a larger number of hits that
when filtered by @ParentAisleID will yield at least @nextitemsrecpointer
hits. You have to be careful here as you don't want to pick too large a
number to guarantee hits.

In large search applications partitioning in frequently used and it does
work very well. They will often have a seperate catalog for each of the
partitioned tables and you will then get a seperate threads for each catalog
which will improve your overall querying and indexing.

Signature

Hilary Cotter
Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html

> Hello,
>
[quoted text clipped - 28 lines]
> Thank you,
> Regards, dnw.
John Kane - 28 Oct 2004 19:03 GMT
DNW,
Could you post the SQL Server and OS platform information from -- SELECT
@@version -- as well as a row count from your table
Store_BasicSearchableShelves? As all of this information is important in
first understanding your environment before making recommendations as well
as understanding the existing RANK values that are returned from your query.

The simple answer to your as why you query is not returning the expected
number of rows when using Top_N_Rank and with an additional WHERE clause is
that all of the WHERE clause parameters are applied AFTER the MSSearch
service returns the Top_N_Rank (not Top_N_Row... but "top_n_by_RANK").
Additionally, and depending upon the number of rows in your table, the
actual number or "top" values for RANK may be not what you expect as in
order to calculate rank, a statically large number of rows need to be
present.

You should also review the following KB article on the use and cautions of
using Top_N_Rank: 240833 (Q240833) "FIX: Full-Text Search Performance
Improved via Support for TOP" at
http://support.microsoft.com//default.aspx?scid=kb;EN-US;240833

Regards,
John

> Your problem is that when you limit your result set that is returned  from
> MSSearch further rows are removed by the  "and ParentAisleId = @PID" clause.
[quoted text clipped - 67 lines]
> > Thank you,
> > Regards, dnw.
Dot net work - 29 Oct 2004 15:22 GMT
Hi John,

>>Could you post the SQL Server and OS platform information from -- SELECT
@@version

Microsoft SQL Server  2000 - 8.00.760 (Intel X86)   Dec 17 2002
14:22:05   Copyright (c) 1988-2003 Microsoft Corporation  Developer
Edition on Windows NT 5.0 (Build 2195: Service Pack 4)

>>as well as a row count from your table Store_BasicSearchableShelves?

At the moment, only 39.

>>The simple answer to your as why you query is not returning the expected
number of rows when using Top_N_Rank and with an additional WHERE
clause is
that all of the WHERE clause parameters are applied AFTER the MSSearch
service returns the Top_N_Rank (not Top_N_Row... but "top_n_by_RANK").

As I am a newbie, please can you explain the difference between those
3 things please - Top_N_Rank, Top_N_Row and top_n_by_RANK.  Thanks.

Thank you,
Regards, dnw.

> DNW,
> Could you post the SQL Server and OS platform information from -- SELECT
[quoted text clipped - 96 lines]
> > > Thank you,
> > > Regards, dnw.
John Kane - 29 Oct 2004 17:36 GMT
Thanks, DNW,
While the version (SQL & OS platform) are less important for your questions,
overall the OS platform is most important for understanding expected FTS
query results when searching on specific words &/or punctuation characters
due to OS-specific wordbreaker issues, see
http://groups.google.com/groups?q=langwrbk+infosoft for details.

However, in this case the row count is the most important factor, especially
when used with Top_N_Rank. I'd recommend that you review SQL Server 2000 BOL
title "Full-Text Search Recommendations" and the next to last paragraph on
RANK for a better understanding of how RANK is calculated in SQL Sever 2000.
Rank needs a "statistically significant" number of rows (and therefore
number of unique non-noise words) in order to be useful and 39 rows is not a
"statistically significant" number of rows. While it may depend upon the
number of unique non-noise words, the number of rows is important as well,
and at least 10,000+ rows are generally the recommended number of rows to
start using RANK and you won't need Top_N_Rank for performance reasons
until at least 1 million rows.

As for "Top_N_Rank, Top_N_Row and top_n_by_RANK", there is only Top_N_Rank,
the other two were only metaphors that I used in my explanation as while
Top_N_Rank does limit the number of rows returned it is in fact a limit for
N (some number) of rows returned by RANK and not explicitly a row limiter as
is Top. Sorry, for the confusion, but with only 39 rows, I'd recommend that
you do not use Top_N_Rank as it was added as a fix in SQL Server 7.0 (and
included in SQL Server 2000) to improve the FTS query performance when used
against very large (1 to 2+ million) row tables that can generate large FT
Catalogs. See KB article 240833 (Q240833) for more info.

Again, thanks for providing the @@version as well as the row count info!
John

> Hi John,
>
[quoted text clipped - 123 lines]
> > > > Thank you,
> > > > Regards, dnw.
Dot net work - 30 Oct 2004 15:14 GMT
That info was really interesting!  Thanks a lot.
-dnw.

> Thanks, DNW,
> While the version (SQL & OS platform) are less important for your questions,
[quoted text clipped - 170 lines]
> > > > > Thank you,
> > > > > Regards, dnw.
Dot net work - 28 Oct 2004 19:50 GMT
Thanks a lot for your advice.
-dnw.

> Your problem is that when you limit your result set that is returned  from
> MSSearch further rows are removed by the  "and ParentAisleId = @PID" clause.
[quoted text clipped - 67 lines]
> > Thank you,
> > Regards, dnw.
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2009 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.