SQL Server Forum / Other Technologies / Full-Text Search / February 2005
Which search happen first ? (Yukon full text search)
|
|
Thread rating:  |
Xin Chen - 25 Feb 2005 07:06 GMT In Yukon, if I do a fulltext search and also put let's say "userid = 5" in the where clause, for example, [Where contains("description","yukon") and userid = 5] Does Yukon search for all the "yukon" among all the datarows first or search for userid =5 first?
Obviously search for userid = 5 first then full text search will be faster, How does Yukon handle this ?
--Xin Chen
Hilary Cotter - 25 Feb 2005 15:27 GMT It could be done first, it depends on parallelism. However, results are returned from the full text catalogs and then merge with the results set returned by userid=5 to whittle down the results to matches of both conditions.
 Signature Hilary Cotter Looking for a SQL Server replication book? http://www.nwsu.com/0974973602.html
Looking for a FAQ on Indexing Services/SQL FTS http://www.indexserverfaq.com
> In Yukon, if I do a fulltext search and also put let's say "userid = 5" in > the where clause, for example, [Where contains("description","yukon") and [quoted text clipped - 6 lines] > > --Xin Chen John Kane - 25 Feb 2005 16:32 GMT Xin Chen, I believe you are concerned about the Yukon FTS service (MSFTESQL) reading all of a large FT Catalog when only results for a subset of rows (userid = 5) is required and therefore taking a long time to return results. Correct?
If so, then Yukon FTS does understand this restriction in the where clause and will only read the *rows* in the FT Catalog based upon the where clause restriction and will return results faster. If not, could you further explain your question?
Thanks, John
 Signature SQL Full Text Search Blog http://spaces.msn.com/members/jtkane/
> In Yukon, if I do a fulltext search and also put let's say "userid = 5" in > the where clause, for example, [Where contains("description","yukon") and [quoted text clipped - 6 lines] > > --Xin Chen Xin Chen - 26 Feb 2005 01:12 GMT John, That is exactly what I asked. Thank you very much for your answer. By the way, I enjoy you blog a lot. Answer question for you. Sometime we divide a large database into identical smaller ones(let's say 5 databases), but keep all the data structure the same. If I fulltext index on each of the 5 databases, then is there a way I can merge the full text search results from 5 database server to provide a complete search result? Does Yukon offer this? I think it is critical feature when you have distributed data. If Yukon doesn't offer this, do you have any idea on how best handle this? Thank you very much !
--Xin Chen
> Xin Chen, > I believe you are concerned about the Yukon FTS service (MSFTESQL) reading [quoted text clipped - 20 lines] > > > > --Xin Chen John Kane - 26 Feb 2005 04:16 GMT You're welcome, Xin Chin! Yes, I believe you can do something similar as you're hinting at via Linked Servers between each database or server. The following is quoted from the SQL Server 2005 Dec '04 CTP BOL "Support for CONTAINS full-text predicate against linked servers SQL Server full-text search functionality can be applied on text data in remote servers when you used the CONTAINS predicate with SELECT queries against linked servers. In Microsoft? SQL ServerT 2005, you can use four-part names in CONTAINS or FREETEXT full-text predicates to execute queries against linked servers."
As I've not yet tested this, and there is no exact syntax sample in the Yukon BOL on how to do this between databases on the same server (vs. on different servers), but I believe the syntax would look something like this after establishing the Linked Servers:
select * from tableA where CONTAINS(serverA.databaseA..tableA.columnA, ' "searchA" ') union all select * from tableA where CONTAINS(serverA.databaseB..tableA.columnA, ' "searchB" ') union all select * from tableA where CONTAINS(serverA.databaseC..tableA.columnA, ' "searchC" ') -- etc...
However, Yukon FTS does not support Distributed Partition Views, so having distributed or propagated search servers, will have to wait till the next version of SQL Server after Yukon...
Hope that helps & keep reading my blog as I'll be posting more on Yukon in the very near future! Thanks, John
 Signature SQL Full Text Search Blog http://spaces.msn.com/members/jtkane/
> John, That is exactly what I asked. Thank you very much for your answer. By > the way, I enjoy you blog a lot. [quoted text clipped - 36 lines] > > > > > > --Xin Chen Xin Chen - 26 Feb 2005 15:00 GMT Thanks again, John, I will try it out on Yukon. I am not sure if the Union All will still make the rank relevent. I heard Microsft said that rank from one database will be irrelvent to that of other database on full text search.
Another thing I worried is the performance. If the search research is large data set, merge result and union them on a single server will mean large network traffic. I wish I know how google does this. I am sure they must use some type of distributed search. How could they do it so FAST?? Any opinions?
--Xin Chen
> You're welcome, Xin Chin! > Yes, I believe you can do something similar as you're hinting at via Linked [quoted text clipped - 79 lines] > > > > > > > > --Xin Chen John Kane - 26 Feb 2005 18:02 GMT You're welcome, Xin Chen, Keep in mind that the Yukon Dec '04 CTP BOL is still a working effort, i.e., not complete and that the entry for "Querying Linked Servers" [ms-help://MS.SQLCC.v9/MS.SQLSVR.v9.en/fulltxt9/html/0e0eaf74-442e-41ed-8198 -ba2c6337e54b.htm] only references CONTAINS and FREETEXT that can be used with linked servers. Furthermore, FREETEXTTABLE [ms-help://MS.SQLCC.v9/MS.SQLSVR.v9.en/tsqlref9/html/4523ae15-4260-40a7-a53c -8df15e1fee79.htm] states that "table cannot specify a server name and cannot be used in queries against linked servers", while CONTAINSTABLE has no such restriction. The latter might be a "DOC" bug, i.e., CONTAINSTABLE may also have the same restriction as FREETEXTTABLE, but not yet documented.
If CONTAINSTABLE and FREETEXTTABLE cannot be used with Linked Servers and as they are the only commands that return RANK, then returning RANK from a cross-database or cross-server join via linked servers may not be possible.
Performance is always a concern in situations such as a cross-database join (FTS or non-FTS) and like they say, the proof is in the pudding and testing would need to be done to confirm that such a multi-database FTS join would perform well. Still, its an interesting feature that is not available in SQL Server 2000 that will have to be more explored under SQL Server 2005!
Yes, Google does use their own-brand of distributed search. Here is a paper that describes the Google Cluster Architecture - http://www.computer.org/micro/mi2003/m2022.pdf. Note, that their "15,000 commodity class PC's with fault-tolerant software" is now reported to be over 100,000 today. Comparing a new database-specific search feature to an Internet-scale Search Engine, is not an apple-to-apple comparison, but still Microsoft SQL FTS Dev could take some pointers from Google for the next version of SQL Sever after Yukon, IMHO...
Regards, John
 Signature SQL Full Text Search Blog http://spaces.msn.com/members/jtkane/
> Thanks again, John, I will try it out on Yukon. I am not sure if the Union > All will still make the rank relevent. I heard Microsft said that rank from [quoted text clipped - 100 lines] > > > > > > > > > > --Xin Chen Xin Chen - 27 Feb 2005 03:33 GMT John, thanks again for your reply and the great article you mentioned. I will be reading it very carefully to see if I can find something I can use.
I had once written a crawler written in C# to collect the web pages and store them in Yukon. I think Yukon (beta 2) corrupted the database when I got to about 1 million page in it. I broke the text content of 1 page into 10 row, so the database(FT) is about 10 million rows. I had hope to see Yukon to index 10 million web page with adequate performance. I think in term of full text indexing and search, its fundamental techniques are pretty common. It just matter of who can implement it better and get the best scalability(or distributed search I believe) which is badly needed when dealing with billion's web pages.
--Xin Chen
> You're welcome, Xin Chen, > Keep in mind that the Yukon Dec '04 CTP BOL is still a working effort, i.e., > not complete and that the entry for "Querying Linked Servers" [ms-help://MS.SQLCC.v9/MS.SQLSVR.v9.en/fulltxt9/html/0e0eaf74-442e-41ed-8198
> -ba2c6337e54b.htm] only references CONTAINS and FREETEXT that can be used > with linked servers. Furthermore, FREETEXTTABLE [ms-help://MS.SQLCC.v9/MS.SQLSVR.v9.en/tsqlref9/html/4523ae15-4260-40a7-a53c
> -8df15e1fee79.htm] states that "table cannot specify a server name and > cannot be used in queries against linked servers", while CONTAINSTABLE has [quoted text clipped - 141 lines] > > > > > > > > > > > > --Xin Chen
|
|
|