Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
Home
Discussion Groups
DB Engine
SQL ServerMSDESQL Server CE
Services
Analysis (Data Mining)Analysis (OLAP)DTSIntegration ServicesNotification ServicesReporting Services
Programming
CLRConnectivitySQLXML
Other Technologies
ClusteringEnglish QueryFull-Text SearchReplicationService Broker
General
Data WarehousingPerformanceSecuritySetupSQL Server ToolsOther SQL Server Topics
DirectoryUser Groups
Related Topics
MS AccessOther DB ProductsMS Server Products.NET DevelopmentVB DevelopmentJava DevelopmentMore Topics ...

SQL Server Forum / Other Technologies / Full-Text Search / August 2007

Tip: Looking for answers? Try searching our database.

newbie: Full Text Search Against PDF Blobs

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
Des - 14 Aug 2007 01:12 GMT
I have a client which this solution sounds perfect for:

Does anyone know a web site that I can test the "Full Text Search
Against PDF Blobs" functionality against.

The website "layout design guy" is saying that SQL 2005 will be too slow
and we should use "Lucene", an open source indexer instead.
Does anyone have any info I can use to show this guy that SQL Server
2005 will be faster?

The target site has several thousand Report PDFs at 2Mb average each
(about 10GB in total).

I have watched the video
"http://download.microsoft.com/download/b/3/8/b3847275-2bea-440a-8e2e-305b009bb26
1/sql_13.wmv
"
that was referenced in this group recently.

Thanks,
Des
Hilary Cotter - 14 Aug 2007 03:58 GMT
With Lucene you really have to roll your own solution, all it is, is a
full-text search engine. You have to write code to query it and to feed
documents to it to index these documents. Lucene is designed for the 5-10
million document range, but can be scaled much higher. It is optimized to
return results in batches to 10, 20, 25 or 100 results. If you return all
results its performance is much worse than SQL FTS.

Lucene allows you to so true property based searches.

SQL FTS is highly scalable but you really have to think about partitioning
after you hit 50 million rows.

You really have to test to see what works best in your environment.

Signature

relevantNoise - dedicated to mining blogs for business intelligence.

Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html

Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com

>I have a client which this solution sounds perfect for:
>
[quoted text clipped - 15 lines]
> Thanks,
> Des
Des - 14 Aug 2007 07:07 GMT
Thanks Hillary,

That was very useful indeed.
I would much prefer to stay inside the environments I am familiar with,
if possible.

Do you know of a site that uses SQL FTS to Search PDF's Stored as Blobs?
It would be nice to know that any "known" shortcomings, before I attempt
to implement it, otherwise I will be thinking it was something I did.

Thanks,
Des

> With Lucene you really have to roll your own solution, all it is, is a
> full-text search engine. You have to write code to query it and to feed
[quoted text clipped - 9 lines]
>
> You really have to test to see what works best in your environment.
Hilary Cotter - 14 Aug 2007 11:56 GMT
No, but I know a site that use sql fts to index over 2 terrabytes.

Signature

relevantNoise - dedicated to mining blogs for business intelligence.

Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html

Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com

> Thanks Hillary,
>
[quoted text clipped - 22 lines]
>>
>> You really have to test to see what works best in your environment.
Simon Sabin - 18 Aug 2007 11:41 GMT
Hello Des,

There was a shortcoming that there was no 64 bit iFilter that has now been
resolved http://sqlblogcasts.com/blogs/simons/archive/2007/07/18/PDF-64-bit-iFilter-at-la
st.aspx


We index a few million CVs in wor format no problem, I wouldn't see any difference
with PDFs

Simon Sabin
SQL Server MVP
http://sqlblogcasts.com/blogs/simons

> Thanks Hillary,
>
[quoted text clipped - 23 lines]
>>
>> You really have to test to see what works best in your environment.
Des - 18 Aug 2007 16:43 GMT
Thanks Simon - Thats good to know :-)

Regards
Des

> Hello Des,
>
[quoted text clipped - 8 lines]
> SQL Server MVP
> http://sqlblogcasts.com/blogs/simons
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2009 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.