Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
Home
Discussion Groups
DB Engine
SQL ServerMSDESQL Server CE
Services
Analysis (Data Mining)Analysis (OLAP)DTSIntegration ServicesNotification ServicesReporting Services
Programming
CLRConnectivitySQLXML
Other Technologies
ClusteringEnglish QueryFull-Text SearchReplicationService Broker
General
Data WarehousingPerformanceSecuritySetupSQL Server ToolsOther SQL Server Topics
DirectoryUser Groups
Related Topics
MS AccessOther DB ProductsMS Server Products.NET DevelopmentVB DevelopmentJava DevelopmentMore Topics ...

SQL Server Forum / Other Technologies / Full-Text Search / November 2004

Tip: Looking for answers? Try searching our database.

MS Sql server 2000 -full text search

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
ananthapus@hotmail.com - 28 Nov 2004 16:38 GMT
I'm trying to evaluate between using MS Sql server based full text search Vs open-source full-text search for our core product.  Does anyone know of any previous benchmarks/comparisons between these two approaches?  Appreciate any reply.
John Kane - 28 Nov 2004 19:52 GMT
Anantha,
Unfortunately, no such benchmarks/comparisons exists today and Microsoft has
never released any such benchmarks as well.

However, I  have long been kicking around the idea of building a "SQL FTS
Benchmarking Toolkit" along the lines of a TPC Benchmark suite and I
submitted and abstract on it for the 2003 PASS conference. I'm assuming
you're comparing SQL Server 2000 FTS vs. MySQL FTS vs. PostgreSQL with is
TSearch2 or OpenFTS as I have researched all of these products or are you
considering other open-source full-text search products?

However, what you can do is build a sample database with publicly available
text data from the Moby lexicon project built by Grady Ward at
http://www.dcs.shef.ac.uk/research/ilash/Moby/ and then setup up a standard
benchmarking test.  Note, this data is freely
available and is in the public domain, per Grady Ward.  Additionally,
Microsoft as well as other RDBMS vendors, such as ORACLE and IBM compete in
standard TPC Benchmarking tests in order to determine which database is the
fastest, etc. while using a standard test suite of tools, database schema
and data, using the TPC Benchmark C (http://www.tpc.org/tpcc/detail.asp).
The TPC Benchmark that is closest to a "Full Text Search" TPC Benchmark is
TCP-W (http://www.tpc.org/tpcw/default.asp), but this too is mostly a
transactional web e-Commerce benchmark and not strictly for FTS queries.
Full Text Indexing (FTI) and Full Text Search (FTS) performance go hand in
hand along with the language of the text (Moby has word lists in five of
languages), the size (both row count and the amount of text per row) to
create a matrix of tests that will not only measure the FTI performance, but
will measure FTS queries from multiple clients issuing random FTS queries.

Additional factors, include both hardware and software configurations, for
example: the number, speed of the CPU's as well as the size and type of
L-cache per CPU. Other hardware configurations, includes the amount of RAM,
the number of disk controllers as well as the type of raid disk drives and
where the database files and FT Catalog files are placed. As you can see
this is a non-trivial effort and one I plan on documenting for my book on
this subject.

I continue to work on completing the "SQL FTS Benchmarking Toolkit" and
until it is completed, I'd recommend that you download some of the Moby test
files and develop a test database and tables and load this data into it and
then use the Microsoft provide client tool OSTRESS utility (download at:
http://support.microsoft.com/default.aspx?scid=kb;en-us;887057) use it to
measure the performance of multiple FTS queries from multiple clients
against your test database for comparisons against other open-source
full-text search for your core product.

Please feel free to contact me if you need additional details.
Regards,
John

> I'm trying to evaluate between using MS Sql server based full text search Vs open-source full-text search for our core product.  Does anyone know of
any previous benchmarks/comparisons between these two approaches?
Appreciate any reply.

> **********************************************************************
> Sent via Fuzzy Software @ http://www.fuzzysoftware.com/
> Comprehensive, categorised, searchable collection of links to ASP & ASP.NET resources...
ananthapus@hotmail.com - 28 Nov 2004 23:07 GMT
John,

Thanks for your help.  I'm trying to evaluate MS SQL Server 2000 FTS with open-source product/framework Lucene.
John Kane - 28 Nov 2004 23:30 GMT
You're welcome, Anantha,
If you're in comparing the open-source product/framework Lucene with SQL
Server 2000 FTS (both very different implementations of Full Text Search),
you may be interested in DotLucene - The Open Source Search Engine for .NET
at: http://openlucene.net/. Also, keep in mind if you're goal is to do full
text search of documents (MS Word, HTML, etc.) stored outside of SQL Server
tables, you can also use the Windows Indexing Service and setup a Linked
Server (via MSIDXS OLEDB provider) for other data stored in SQL Server.

Regards,
John

> John,
>
[quoted text clipped - 3 lines]
> Sent via Fuzzy Software @ http://www.fuzzysoftware.com/
> Comprehensive, categorised, searchable collection of links to ASP & ASP.NET resources...
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2009 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.