SQL Server Forum / Other Technologies / Full-Text Search / October 2004
containstable, top_n_rank, and additional where clause combination causes unexpected result
|
|
Thread rating:  |
Dot net work - 28 Oct 2004 09:48 GMT Hello,
Something strange happens if I try and do the following:
If I use the containstable function, with top_n_rank, along with an extra "and where" clause, then the top_n_rank does not seem to return the correct number of rows.
Here is an example to explain my point:
select *, tempidentity = IDENTITY (INT) into #tempt from Store_BasicSearchableShelves, containstable(Store_BasicSearchableShelves, ShelfName, @sSearchText, @nextitemsrecpointer) tblSearchResults where [key] = Store_BasicSearchableShelves.ShelfId and ParentAisleId = @PID order by rank desc
If @nextitemsrecpointer is equal to 20, then the stored proc only returns 14 rows. If I remove the additional "and ParentAisleId = @PID" where clause, then the stored proc returns 20 rows, which is correct. If I begin to edit the value of @nextitemsrecpointer for experimentational purposes to a higher value such as 30 or 40, then the stored proc returns more rows - 18 and 25 respectively.
The additional "and ParentAisleId = @PID" seems to be affecting the way that top_n_rank is behaving.
Can anyone please provide me with a work around for this?
Thank you, Regards, dnw.
Hilary Cotter - 28 Oct 2004 12:18 GMT Your problem is that when you limit your result set that is returned from MSSearch further rows are removed by the "and ParentAisleId = @PID" clause.
The approaches to this problem are 1) knowing in advance the number of rows which are returned by MSSearch for this query and entering this value for @nextitemrecpointer, 2) partitioning your table into multiple partitioned tables one for each PartentAisleID value so your query would end up looking something like this:
if @ParentAisleID=1 begin select *, tempidentity = IDENTITY (INT) into #tempt from Store_BasicSearchableShelves_1, containstable(Store_BasicSearchableShelves, ShelfName, @sSearchText, @nextitemsrecpointer) tblSearchResults where [key] = Store_BasicSearchableShelves.ShelfId order by rank desc end
if @ParentAisleID=2 begin select *, tempidentity = IDENTITY (INT) into #tempt from Store_BasicSearchableShelves_2, containstable(Store_BasicSearchableShelves, ShelfName, @sSearchText, @nextitemsrecpointer) tblSearchResults where [key] = Store_BasicSearchableShelves.ShelfId order by rank desc end
3) picking a larger number which will guarantee a larger number of hits that when filtered by @ParentAisleID will yield at least @nextitemsrecpointer hits. You have to be careful here as you don't want to pick too large a number to guarantee hits.
In large search applications partitioning in frequently used and it does work very well. They will often have a seperate catalog for each of the partitioned tables and you will then get a seperate threads for each catalog which will improve your overall querying and indexing.
 Signature Hilary Cotter Looking for a SQL Server replication book? http://www.nwsu.com/0974973602.html
> Hello, > [quoted text clipped - 28 lines] > Thank you, > Regards, dnw. John Kane - 28 Oct 2004 19:03 GMT DNW, Could you post the SQL Server and OS platform information from -- SELECT @@version -- as well as a row count from your table Store_BasicSearchableShelves? As all of this information is important in first understanding your environment before making recommendations as well as understanding the existing RANK values that are returned from your query.
The simple answer to your as why you query is not returning the expected number of rows when using Top_N_Rank and with an additional WHERE clause is that all of the WHERE clause parameters are applied AFTER the MSSearch service returns the Top_N_Rank (not Top_N_Row... but "top_n_by_RANK"). Additionally, and depending upon the number of rows in your table, the actual number or "top" values for RANK may be not what you expect as in order to calculate rank, a statically large number of rows need to be present.
You should also review the following KB article on the use and cautions of using Top_N_Rank: 240833 (Q240833) "FIX: Full-Text Search Performance Improved via Support for TOP" at http://support.microsoft.com//default.aspx?scid=kb;EN-US;240833
Regards, John
> Your problem is that when you limit your result set that is returned from > MSSearch further rows are removed by the "and ParentAisleId = @PID" clause. [quoted text clipped - 67 lines] > > Thank you, > > Regards, dnw. Dot net work - 29 Oct 2004 15:22 GMT Hi John,
>>Could you post the SQL Server and OS platform information from -- SELECT @@version
Microsoft SQL Server 2000 - 8.00.760 (Intel X86) Dec 17 2002 14:22:05 Copyright (c) 1988-2003 Microsoft Corporation Developer Edition on Windows NT 5.0 (Build 2195: Service Pack 4)
>>as well as a row count from your table Store_BasicSearchableShelves? At the moment, only 39.
>>The simple answer to your as why you query is not returning the expected number of rows when using Top_N_Rank and with an additional WHERE clause is that all of the WHERE clause parameters are applied AFTER the MSSearch service returns the Top_N_Rank (not Top_N_Row... but "top_n_by_RANK").
As I am a newbie, please can you explain the difference between those 3 things please - Top_N_Rank, Top_N_Row and top_n_by_RANK. Thanks.
Thank you, Regards, dnw.
> DNW, > Could you post the SQL Server and OS platform information from -- SELECT [quoted text clipped - 96 lines] > > > Thank you, > > > Regards, dnw. John Kane - 29 Oct 2004 17:36 GMT Thanks, DNW, While the version (SQL & OS platform) are less important for your questions, overall the OS platform is most important for understanding expected FTS query results when searching on specific words &/or punctuation characters due to OS-specific wordbreaker issues, see http://groups.google.com/groups?q=langwrbk+infosoft for details.
However, in this case the row count is the most important factor, especially when used with Top_N_Rank. I'd recommend that you review SQL Server 2000 BOL title "Full-Text Search Recommendations" and the next to last paragraph on RANK for a better understanding of how RANK is calculated in SQL Sever 2000. Rank needs a "statistically significant" number of rows (and therefore number of unique non-noise words) in order to be useful and 39 rows is not a "statistically significant" number of rows. While it may depend upon the number of unique non-noise words, the number of rows is important as well, and at least 10,000+ rows are generally the recommended number of rows to start using RANK and you won't need Top_N_Rank for performance reasons until at least 1 million rows.
As for "Top_N_Rank, Top_N_Row and top_n_by_RANK", there is only Top_N_Rank, the other two were only metaphors that I used in my explanation as while Top_N_Rank does limit the number of rows returned it is in fact a limit for N (some number) of rows returned by RANK and not explicitly a row limiter as is Top. Sorry, for the confusion, but with only 39 rows, I'd recommend that you do not use Top_N_Rank as it was added as a fix in SQL Server 7.0 (and included in SQL Server 2000) to improve the FTS query performance when used against very large (1 to 2+ million) row tables that can generate large FT Catalogs. See KB article 240833 (Q240833) for more info.
Again, thanks for providing the @@version as well as the row count info! John
> Hi John, > [quoted text clipped - 123 lines] > > > > Thank you, > > > > Regards, dnw. Dot net work - 30 Oct 2004 15:14 GMT That info was really interesting! Thanks a lot. -dnw.
> Thanks, DNW, > While the version (SQL & OS platform) are less important for your questions, [quoted text clipped - 170 lines] > > > > > Thank you, > > > > > Regards, dnw. Dot net work - 28 Oct 2004 19:50 GMT Thanks a lot for your advice. -dnw.
> Your problem is that when you limit your result set that is returned from > MSSearch further rows are removed by the "and ParentAisleId = @PID" clause. [quoted text clipped - 67 lines] > > Thank you, > > Regards, dnw.
|
|
|