Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
Home
Discussion Groups
DB Engine
SQL ServerMSDESQL Server CE
Services
Analysis (Data Mining)Analysis (OLAP)DTSIntegration ServicesNotification ServicesReporting Services
Programming
CLRConnectivitySQLXML
Other Technologies
ClusteringEnglish QueryFull-Text SearchReplicationService Broker
General
Data WarehousingPerformanceSecuritySetupSQL Server ToolsOther SQL Server Topics
DirectoryUser Groups
Related Topics
MS AccessOther DB ProductsMS Server Products.NET DevelopmentVB DevelopmentJava DevelopmentMore Topics ...

SQL Server Forum / Other Technologies / Full-Text Search / January 2005

Tip: Looking for answers? Try searching our database.

Storing large documents in SQL Server

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
ananthapus@hotmail.com - 21 Jan 2005 16:11 GMT
Hi,

We are debating whether it is a good idea to store large documents (on the average over 100 MB - max size currently at 220  MB) in SQL server itself OR to strip the document of the "noise" words outside of sql server and store just the words we need to be indexed.

We are concerned about the performance on both approaches......

Any ideas? Any prior experiences..........

Appreciate your feedback

Anantha  
Hilary Cotter - 21 Jan 2005 17:27 GMT
That depends on the amount of text content in these documents. According to
BOL:

For full-text indexing, a document must be less than 16 megabytes (MB) in
size and must not contain more than 256 kilobytes (KB) of filtered text.

This is my experience as well.

Stripping out noise words from your content and your search phrase is an
option. Your compression factor may not be as dramatic as you think though,
but this depends on the number of words you can consider to be noise words
and the frequency of their occurrence in your data.

The performance implications are not really that significant. You should
strip the noise words out on the client end before it hits SQL Server - both
for loading and querying the data.

With large numbers of noise words (in your noise word list) you will see
marginal improvement in indexing and querying time.

Signature

Hilary Cotter
Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html

> Hi,
>
> We are debating whether it is a good idea to store large documents (on the average over 100 MB - max size currently at 220  MB) in SQL server itself OR
to strip the document of the "noise" words outside of sql server and store
just the words we need to be indexed.

> We are concerned about the performance on both approaches......
>
[quoted text clipped - 7 lines]
> Sent via Fuzzy Software @ http://www.fuzzysoftware.com/
> Comprehensive, categorised, searchable collection of links to ASP & ASP.NET resources...
ananthapus@hotmail.com - 21 Jan 2005 19:28 GMT
Thanks Hilary.  

You keep referring to BOL.  Do you happen to have the URL for BOL?  Appreciate if you could send it to me.  Will help me in implementation.  

Thanks,
Anantha
Hilary Cotter - 21 Jan 2005 19:33 GMT
Bol is books on line. Query "Filtering Supported File Types extracted 256"
to see the page I am referring to

Signature

Hilary Cotter
Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html

> Thanks Hilary.
>
[quoted text clipped - 6 lines]
> Sent via Fuzzy Software @ http://www.fuzzysoftware.com/
> Comprehensive, categorised, searchable collection of links to ASP & ASP.NET resources...
John Kane - 22 Jan 2005 03:54 GMT
Anantha,
Storing very large MS Word documents in SQL Server vs. in the file system
with the pointers to the MS Word documents stored in SQL Server is more of
an application design decision. As this is one of those FAQ type questions
that have been known to start
religious wars or at the very least a flame war, and while Sharon didn't ask
about other application issues that this question often generates as it is
often an open-ended question, i.e., one that never seems to be answered to
everyone's satisfactions as it usually "depends" upon the application or
upon how one defines the word "best"...

As a rule-of-thumb and depending upon how many files you have, how
frequently they change, how large they are, what the app is, etc. you may
want to review the web site and the following rule of thumb that they used:

< 1 million images or big images (> 1MB) put them in the file system.
> 1 million and < 1 MB images, put them in SQL Server.

For everything in between, either way will work, depending upon if you need
transactional control over your files. Additionally, if you store the files
in a TEXT or IMAGE column, you can also store related metadata about that
file in SQL Server as well for increased searching capabilities. Also, and
obviously with SQL Server you get built-in support for validating the
consistency of the database, indices, backup, restore, etc. However, the MS
Word document size alone should not be the only justification for storing or
not storing large files in SQL Server. As for the max limit of FT Indexable
text for large files, see:

See SQL Server 2000 Books Online (BOL) title "Filtering Supported File
Types"online at MSDN:
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/createdb/cm_ful
lad_55mb.asp


Actually, the BOL entry is wrong as there is a DOC bug filed for this
limited in the BOL title "Filtering Supported File Types" - "Note  For
full-text indexing, a document must be less than 16 megabytes (MB) in size
and must not contain more than 256 kilobytes (KB) of filtered text" and this
limit can be over-ridden via KB article: 308771 (Q308771) "PRB: A Full-Text
Search May Not Return Any Hits If It Fails to Index a File" at
http://support.microsoft.com/default.aspx?scid=kb;en-us;308771. and the
FilterProcessMemoryQuota registry key value.  However, you should be careful
in making adjustments to this registry key and incrementally increase it
based upon your server's memory and avg. file sizes.

Hope this helps!
John
Signature

SQL Full Text Search Blog
http://spaces.msn.com/members/jtkane/

> Hi,
>
> We are debating whether it is a good idea to store large documents (on the average over 100 MB - max size currently at 220  MB) in SQL server itself OR
to strip the document of the "noise" words outside of sql server and store
just the words we need to be indexed.

> We are concerned about the performance on both approaches......
>
[quoted text clipped - 7 lines]
> Sent via Fuzzy Software @ http://www.fuzzysoftware.com/
> Comprehensive, categorised, searchable collection of links to ASP & ASP.NET resources...
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2009 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.