Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
Home
Discussion Groups
DB Engine
SQL ServerMSDESQL Server CE
Services
Analysis (Data Mining)Analysis (OLAP)DTSIntegration ServicesNotification ServicesReporting Services
Programming
CLRConnectivitySQLXML
Other Technologies
ClusteringEnglish QueryFull-Text SearchReplicationService Broker
General
Data WarehousingPerformanceSecuritySetupSQL Server ToolsOther SQL Server Topics
DirectoryUser Groups
Related Topics
MS AccessOther DB ProductsMS Server Products.NET DevelopmentVB DevelopmentJava DevelopmentMore Topics ...

SQL Server Forum / DB Engine / SQL Server / July 2008

Tip: Looking for answers? Try searching our database.

Large Quantities Of Data

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
info@colinramsay.co.uk - 10 Jul 2008 13:12 GMT
Hi all,

I have little experience with SQL Server when it's filled with lots
and lots of data. In fact I'm not really that sure what would be
considered a lot of data!

Basically I am involved with a system which will be accumulating at
least a billion rows per year, and most of that is going to be on one
table. All of that data needs to be available for real-time reporting
at any time.

I'd like to draw on community knowledge here, as I'm naive in the
extreme about applications like this. Are there any thoughts or
immediate problems that you could see?

Best Regards,
Colin Ramsay
Linchi Shea - 10 Jul 2008 13:26 GMT
> I'd like to draw on community knowledge here, as I'm naive in the
> extreme about applications like this.

That sounds dangerous because you just can't be naive when trying to manage
a table that grows at 1 billion rows per year. In any event, one of the most
effective technique in managing large databases is partition. Or at an even
higher level, the approach is to reduce--as much as possible--the unit of
operations that threaten your service availability.

Linchi

> Hi all,
>
[quoted text clipped - 13 lines]
> Best Regards,
> Colin Ramsay
Dan Guzman - 10 Jul 2008 13:28 GMT
> I'd like to draw on community knowledge here, as I'm naive in the
> extreme about applications like this. Are there any thoughts or
> immediate problems that you could see?

Attention to detail (especially indexing) is the main key to performance.
With an efficient path to the data, queries should perform well regardless
of table size.

Manageability is a major consideration with large tables.  For example,
you'll need to maintain about 120% free space in order to rebuild the
clustered index.  Table partitioning can help in this regard and improve
performance of certain types of scans too.  Also SQL 2008 will introduce
compression features to reduce space requirements and improve performance
and manageability.  Note that these features are available only in
Enterprise and Developer editions.

Signature

Hope this helps.

Dan Guzman
SQL Server MVP
http://weblogs.sqlteam.com/dang/

> Hi all,
>
[quoted text clipped - 13 lines]
> Best Regards,
> Colin Ramsay
info@colinramsay.co.uk - 10 Jul 2008 14:26 GMT
On Jul 10, 1:28 pm, "Dan Guzman" <guzma...@nospam-
online.sbcglobal.net> wrote:
> > I'd like to draw on community knowledge here, as I'm naive in the
> > extreme about applications like this. Are there any thoughts or
[quoted text clipped - 39 lines]
> > Best Regards,
> > Colin Ramsay

Thanks for the replies so far.

> That sounds dangerous because you just can't be naive when trying to manage
> a table that grows at 1 billion rows per year.

I agree, which is why I'm trying to expand my knowledge before diving
in head first.

Any more thoughts would be much appreciated!
TheSQLGuru - 10 Jul 2008 16:01 GMT
My thought is that you absolutely MUST get someone on board (permanent or
more likely a consultant) that has experience with very large sql server
installations.  It is simply unthinkable that you, with little to no
experience, can successfully pull off a project of this magnitude.  And I
don't mean that as a slam against you but a simple statement of fact.  I
can't do theoretical physics (despite the fact that I have a degree in
physics :) because I haven't been trained nor am I experienced in it, and if
I were thrust into a position where that was a requirement I would get help
from a professional who could.

Signature

Kevin G. Boles
Indicium Resources, Inc.
SQL Server MVP
kgboles a earthlink dt net

On Jul 10, 1:28 pm, "Dan Guzman" <guzma...@nospam-
online.sbcglobal.net> wrote:
> > I'd like to draw on community knowledge here, as I'm naive in the
> > extreme about applications like this. Are there any thoughts or
[quoted text clipped - 39 lines]
> > Best Regards,
> > Colin Ramsay

Thanks for the replies so far.

> That sounds dangerous because you just can't be naive when trying to
> manage
> a table that grows at 1 billion rows per year.

I agree, which is why I'm trying to expand my knowledge before diving
in head first.

Any more thoughts would be much appreciated!
info@colinramsay.co.uk - 10 Jul 2008 17:16 GMT
> My thought is that you absolutely MUST get someone on board (permanent or
> more likely a consultant) that has experience with very large sql server
[quoted text clipped - 72 lines]
>
> Any more thoughts would be much appreciated!

I think that's a fair point, and definitely an option. But there's no
point in paying someone to consult on something when you're not sure
if there's anything to consult upon! I could have come here and been
told that so long as I do xyz, SQL Server will happily handle that
level of data. So again, this is really more about fact-finding than
"oh god I've got a billion rows and my server's dying please help".

To be honest, I'd love to just get someone in and say "hey, handle the
database stuff will you?".
SB - 11 Jul 2008 09:41 GMT
On Jul 10, 10:16 pm, "i...@colinramsay.co.uk" <i...@colinramsay.co.uk>
wrote:

> > My thought is that you absolutely MUST get someone on board (permanent or
> > more likely a consultant) that has experience with very large sql server
[quoted text clipped - 84 lines]
>
> - Show quoted text -

Hi,
When there is a table with billion rows then you must reduce how many
times you scan the table. You should try to minimize the scans and try
to do everything in one scan if possible. If you issue two update
using two separate queries then you will be hit twice the cost,
instead you should try to do all updates using one statement and stuck
with one scan. Indexes could help but may not be possible in some
situations. Avoid UDF (user defined functions) at all costs. Also, you
may consider not allowing write permission to large tables as users
(novice) may write queries (stored procedures) that might slow down
the server. Basically, you have to be very careful touching that table
in any shape or form. You can even be creative and chunk out some data
into a separate table and store them. Then you have to run some
procedures that will go through the main large table (preferably then
the server is idle) and populate those temp table (permanent). Good
luck.
Ben - 12 Jul 2008 15:04 GMT
> On Jul 10, 10:16 pm, "i...@colinramsay.co.uk" <i...@colinramsay.co.uk>
> wrote:
[quoted text clipped - 104 lines]
> the server is idle) and populate those temp table (permanent). Good
> luck.

Do you know if there are any indexes built on your table right now?
More info would be helpful if you have any.  Who designed the schema?
I would start with asking yourself how the data is being used.  Is it
being used as an archive or possibly an OLTP system? When you have
that figured out then you will want to build one clustered and a
couple non-clustered indexes on key columns in your table. After that
is done you will want to think about a good backup and DR plan.  Again
this all depends on the use of the data.

You may need to do partitioning as someone mentioned and or split your
database in file groups. Being practical this is allot of work
depending how many people are affected by this system in your
organization, so the planning and design may be more or less
difficult. Also there are different levels of bureaucracy in any
organization.

Hope this helps. Good luck.

-Ben
JXStern - 12 Jul 2008 17:19 GMT
I'll just echo the good advice you've already been given, but add a
few more points.  Just FYI, something like a million rows is small
these days and would require only modest care, but a billion, yes,
you're getting up there, that's a thousand times what modest care can
handle!  

(oh, and that's an American billion, 1e9, I hope, if it's a British
1e12, you're in big trouble! let's assume you meant 1,000,000,000)

A billion rows for real-time reporting?  Come now.  I presume only a
few of those rows will be current at any given time.  On any system
you'll want to do a lot of preprocessing of the data to date, in order
that real-time reports (eg, dashboards) can be done quickly.

Are these big rows?  How much disk space are you going to need, for
starters?  Remember, database storage adds overhead.

Are you sure it's a billion rows?  Maybe it's 365 rows with a ten
megabyte blob in each.

How are they arriving?  If in real time evenly 24x7, that's about 2k
rows/minute, say 30/second.  If they arrive in big files 10,000,000 at
a time to import, that's still an issue.

What's the application?  If this is something common, you might find a
solution is available off the shelf.

Josh

>Hi all,
>
[quoted text clipped - 13 lines]
>Best Regards,
>Colin Ramsay
info@colinramsay.co.uk - 13 Jul 2008 22:06 GMT
> I'll just echo the good advice you've already been given, but add a
> few more points.  Just FYI, something like a million rows is small
[quoted text clipped - 45 lines]
> >Best Regards,
> >Colin Ramsay

Thanks for the further posts guys.

The application is being designed from scratch, so we can design it
with efficiency in mind. The rows are going to contain very little
data, but will be coming in fairly regularly over the day. JXStern,
you're probably right that most common reports could be pre-processed,
but the option will need to be there to compare trends over any given
period.
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2009 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.