Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
Home
Discussion Groups
DB Engine
SQL ServerMSDESQL Server CE
Services
Analysis (Data Mining)Analysis (OLAP)DTSIntegration ServicesNotification ServicesReporting Services
Programming
CLRConnectivitySQLXML
Other Technologies
ClusteringEnglish QueryFull-Text SearchReplicationService Broker
General
Data WarehousingPerformanceSecuritySetupSQL Server ToolsOther SQL Server Topics
DirectoryUser Groups
Related Topics
MS AccessOther DB ProductsMS Server Products.NET DevelopmentVB DevelopmentJava DevelopmentMore Topics ...

SQL Server Forum / Other Technologies / Full-Text Search / August 2006

Tip: Looking for answers? Try searching our database.

Blobs and SQL Server

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
bormesh@gmail.com - 21 Aug 2006 22:15 GMT
Hey Hilary,

Mike from microsoft.public.sqlserver.programming told me to talk to
you about my question. Here's my original post:

I'm looking for some information about storing BLOBs in SQL. A bit of
background: my company is developing a document management system.
Previously, we had stored the files on the file structure and used a
pointer in the database. We're not switching because the next iteration
of our system is going to allow for full text searches, and this
becomes a simple task when the entire document is stored in the
database.

My question is how performance will be affected when I want to import
large documents/files into the database? Will I see a huge slow down if
I import a 500 MB file? I've researched this a bit, and I don't want to
use a bit array to import the document into the database. I'd rather
just import it (but will memory be affected horribly?)

If anyone could help me out a bit, or just point me in the right
direction, that would be great.

Thanks for your help,

Rob Heinen
Hilary Cotter - 22 Aug 2006 02:18 GMT
Yes, there is a performance hit while importing blobs, mainly disk and to a
lesser degree cpu. 500 Mgs is a large chunk of data, there is no easy way to
push it in. Any way you could extract the textual content and store it in
the db?

In general you store the blob in the database for manageability reasons. For
performance reasons you store the path in the db and keep the blob in the
file system. Research has indicated that you should store the blobs in a
directory structure maintained by a hash so that you have a large number of
subdirectories. retrieving a blob this way offers best seek performance in
NTFS. Also web servers are optimized for sending files, databases aren't.
You can also send the file to the client asynchronously using an iframe, if
you are retrieving it from the db it would be done serially.

Signature

Hilary Cotter
Director of Text Mining and Database Strategy
RelevantNOISE.Com - Dedicated to mining blogs for business intelligence.

This posting is my own and doesn't necessarily represent RelevantNoise's
positions, strategies or opinions.

Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html

Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com

> Hey Hilary,
>
[quoted text clipped - 21 lines]
>
> Rob Heinen
bormesh@gmail.com - 22 Aug 2006 21:19 GMT
Thanks for the information Hilary,

When I'm inserting the blob into the database, I was under the
impression that it had to be split up into a series of smaller files.
If a large file is inserted, then it will require many small chunks. As
we insert, I assume we're going to be using RAM to store the chunks and
then we'll hit back and forth between the file system and the database,
all the while using up our memory. Am I correct in this assumption? I
guess I'm just wondering at the feasiblity of actually inserting a
larger blob (maybe not as huge as 500 mb) on a slower computer
(assuming the computers that we're developing for have an average of
512MB of RAM). Or does the amount of RAM on the client system not
matter, and only the specs of our Database machine matter?

I hope this makes sense... I can give you more information if you need
it.

Thanks,
Rob Heinen

> Yes, there is a performance hit while importing blobs, mainly disk and to a
> lesser degree cpu. 500 Mgs is a large chunk of data, there is no easy way to
[quoted text clipped - 49 lines]
> >
> > Rob Heinen
Hilary Cotter - 23 Aug 2006 00:23 GMT
Older versions of SQL Server required the blob be loaded in chunks, older
versions of ADO required you use the getchunk method, but newer versions
allow you use to stream object to push it in. The client will consume
virtual ram on the host it is running on, not necessarily the SQL Server
box.

There is a cpu and disk IO hit while inserting the blob. I have not
observered a RAM hit.
Signature

Hilary Cotter
Director of Text Mining and Database Strategy
RelevantNOISE.Com - Dedicated to mining blogs for business intelligence.

This posting is my own and doesn't necessarily represent RelevantNoise's
positions, strategies or opinions.

Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html

Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com

> Thanks for the information Hilary,
>
[quoted text clipped - 75 lines]
>> >
>> > Rob Heinen
Kevin.w.Berridge@gmail.com - 24 Aug 2006 18:24 GMT
Your first suggestion was: "Any way you could extract the textual
content and store it in
the db?"

We could do this, but we thought it might be better to store the binary
in the database and not have a copy of the binary on the file system
with a copy of the full text in the database and a pointer in the
database to the binary file as well as a mechanism to extract the full
text from the file.  What's your take on this?

Also, even inserting the text instead of the binary, if the file is
sufficiently large I can imagine a situation in which inserting the
blob the standard way could cause some very un-needed hard drive
thrashing.  (The standard way being reading all the text into a
variable and then supplying that variable as a parameter to an ad hoc
SQL query or Stored Procedure.)  In ADODB there was the Stream object,
but there doesn't seem to be an equivalent in ADO.NET.  Do you have a
suggestion for a good way to go about this?

Finally, we also wanted to return a sample of the text that matched the
search with our results.  I know that this is not possible in SQL
Server, so I'm wondering there might be an entirely different approach
for both storing and searching these documents that we should consider.

Thanks for all your help and feedback!
Kevin
Kevin.w.Berridge@gmail.com - 29 Aug 2006 19:31 GMT
"Research has indicated that you should store the blobs in a
directory structure maintained by a hash so that you have a large
number of
subdirectories."

Could you expound on this a little more?  We have decided to continue
storing our documents in the file system. I've been looking for some
sources detailing the kind of directory structure you mention but so
far no luck.

Thanks,
Kevin
Hilary Cotter - 29 Aug 2006 21:50 GMT
Have a look at
http://groups.google.com/group/microsoft.public.win2000.file_system/browse_frm/t
hread/e7b4d86095272427/257d6a0a740f9d6c?lnk=st&q=%22maximum+number+of+files%22+%
22in+a+folder%22+windows&rnum=1&hl=en#257d6a0a740f9d6c


and

http://www.mcse.ms/archive42-2004-12-1284650.html

Basically on previous projects I have worked on we got optimal performance
by having 100 subdirectories off the root, each with 100 subdirectories in
it, each with 100 files. Also you get better performance with file name like
00000filename.jpg than filename00000.jpg.

Signature

Hilary Cotter
Director of Text Mining and Database Strategy
RelevantNOISE.Com - Dedicated to mining blogs for business intelligence.

This posting is my own and doesn't necessarily represent RelevantNoise's
positions, strategies or opinions.

Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html

Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com

> "Research has indicated that you should store the blobs in a
> directory structure maintained by a hash so that you have a large
[quoted text clipped - 8 lines]
> Thanks,
> Kevin
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2009 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.