SQL Server Forum / DB Engine / SQL Server / July 2008
Large Quantities Of Data
|
|
Thread rating:  |
info@colinramsay.co.uk - 10 Jul 2008 13:12 GMT Hi all,
I have little experience with SQL Server when it's filled with lots and lots of data. In fact I'm not really that sure what would be considered a lot of data!
Basically I am involved with a system which will be accumulating at least a billion rows per year, and most of that is going to be on one table. All of that data needs to be available for real-time reporting at any time.
I'd like to draw on community knowledge here, as I'm naive in the extreme about applications like this. Are there any thoughts or immediate problems that you could see?
Best Regards, Colin Ramsay
Linchi Shea - 10 Jul 2008 13:26 GMT > I'd like to draw on community knowledge here, as I'm naive in the > extreme about applications like this. That sounds dangerous because you just can't be naive when trying to manage a table that grows at 1 billion rows per year. In any event, one of the most effective technique in managing large databases is partition. Or at an even higher level, the approach is to reduce--as much as possible--the unit of operations that threaten your service availability.
Linchi
> Hi all, > [quoted text clipped - 13 lines] > Best Regards, > Colin Ramsay Dan Guzman - 10 Jul 2008 13:28 GMT > I'd like to draw on community knowledge here, as I'm naive in the > extreme about applications like this. Are there any thoughts or > immediate problems that you could see? Attention to detail (especially indexing) is the main key to performance. With an efficient path to the data, queries should perform well regardless of table size.
Manageability is a major consideration with large tables. For example, you'll need to maintain about 120% free space in order to rebuild the clustered index. Table partitioning can help in this regard and improve performance of certain types of scans too. Also SQL 2008 will introduce compression features to reduce space requirements and improve performance and manageability. Note that these features are available only in Enterprise and Developer editions.
 Signature Hope this helps.
Dan Guzman SQL Server MVP http://weblogs.sqlteam.com/dang/
> Hi all, > [quoted text clipped - 13 lines] > Best Regards, > Colin Ramsay info@colinramsay.co.uk - 10 Jul 2008 14:26 GMT On Jul 10, 1:28 pm, "Dan Guzman" <guzma...@nospam- online.sbcglobal.net> wrote:
> > I'd like to draw on community knowledge here, as I'm naive in the > > extreme about applications like this. Are there any thoughts or [quoted text clipped - 39 lines] > > Best Regards, > > Colin Ramsay Thanks for the replies so far.
> That sounds dangerous because you just can't be naive when trying to manage > a table that grows at 1 billion rows per year. I agree, which is why I'm trying to expand my knowledge before diving in head first.
Any more thoughts would be much appreciated!
TheSQLGuru - 10 Jul 2008 16:01 GMT My thought is that you absolutely MUST get someone on board (permanent or more likely a consultant) that has experience with very large sql server installations. It is simply unthinkable that you, with little to no experience, can successfully pull off a project of this magnitude. And I don't mean that as a slam against you but a simple statement of fact. I can't do theoretical physics (despite the fact that I have a degree in physics :) because I haven't been trained nor am I experienced in it, and if I were thrust into a position where that was a requirement I would get help from a professional who could.
 Signature Kevin G. Boles Indicium Resources, Inc. SQL Server MVP kgboles a earthlink dt net
On Jul 10, 1:28 pm, "Dan Guzman" <guzma...@nospam- online.sbcglobal.net> wrote:
> > I'd like to draw on community knowledge here, as I'm naive in the > > extreme about applications like this. Are there any thoughts or [quoted text clipped - 39 lines] > > Best Regards, > > Colin Ramsay Thanks for the replies so far.
> That sounds dangerous because you just can't be naive when trying to > manage > a table that grows at 1 billion rows per year. I agree, which is why I'm trying to expand my knowledge before diving in head first.
Any more thoughts would be much appreciated!
info@colinramsay.co.uk - 10 Jul 2008 17:16 GMT > My thought is that you absolutely MUST get someone on board (permanent or > more likely a consultant) that has experience with very large sql server [quoted text clipped - 72 lines] > > Any more thoughts would be much appreciated! I think that's a fair point, and definitely an option. But there's no point in paying someone to consult on something when you're not sure if there's anything to consult upon! I could have come here and been told that so long as I do xyz, SQL Server will happily handle that level of data. So again, this is really more about fact-finding than "oh god I've got a billion rows and my server's dying please help".
To be honest, I'd love to just get someone in and say "hey, handle the database stuff will you?".
SB - 11 Jul 2008 09:41 GMT On Jul 10, 10:16 pm, "i...@colinramsay.co.uk" <i...@colinramsay.co.uk> wrote:
> > My thought is that you absolutely MUST get someone on board (permanent or > > more likely a consultant) that has experience with very large sql server [quoted text clipped - 84 lines] > > - Show quoted text - Hi, When there is a table with billion rows then you must reduce how many times you scan the table. You should try to minimize the scans and try to do everything in one scan if possible. If you issue two update using two separate queries then you will be hit twice the cost, instead you should try to do all updates using one statement and stuck with one scan. Indexes could help but may not be possible in some situations. Avoid UDF (user defined functions) at all costs. Also, you may consider not allowing write permission to large tables as users (novice) may write queries (stored procedures) that might slow down the server. Basically, you have to be very careful touching that table in any shape or form. You can even be creative and chunk out some data into a separate table and store them. Then you have to run some procedures that will go through the main large table (preferably then the server is idle) and populate those temp table (permanent). Good luck.
Ben - 12 Jul 2008 15:04 GMT > On Jul 10, 10:16 pm, "i...@colinramsay.co.uk" <i...@colinramsay.co.uk> > wrote: [quoted text clipped - 104 lines] > the server is idle) and populate those temp table (permanent). Good > luck. Do you know if there are any indexes built on your table right now? More info would be helpful if you have any. Who designed the schema? I would start with asking yourself how the data is being used. Is it being used as an archive or possibly an OLTP system? When you have that figured out then you will want to build one clustered and a couple non-clustered indexes on key columns in your table. After that is done you will want to think about a good backup and DR plan. Again this all depends on the use of the data.
You may need to do partitioning as someone mentioned and or split your database in file groups. Being practical this is allot of work depending how many people are affected by this system in your organization, so the planning and design may be more or less difficult. Also there are different levels of bureaucracy in any organization.
Hope this helps. Good luck.
-Ben
JXStern - 12 Jul 2008 17:19 GMT I'll just echo the good advice you've already been given, but add a few more points. Just FYI, something like a million rows is small these days and would require only modest care, but a billion, yes, you're getting up there, that's a thousand times what modest care can handle!
(oh, and that's an American billion, 1e9, I hope, if it's a British 1e12, you're in big trouble! let's assume you meant 1,000,000,000)
A billion rows for real-time reporting? Come now. I presume only a few of those rows will be current at any given time. On any system you'll want to do a lot of preprocessing of the data to date, in order that real-time reports (eg, dashboards) can be done quickly.
Are these big rows? How much disk space are you going to need, for starters? Remember, database storage adds overhead.
Are you sure it's a billion rows? Maybe it's 365 rows with a ten megabyte blob in each.
How are they arriving? If in real time evenly 24x7, that's about 2k rows/minute, say 30/second. If they arrive in big files 10,000,000 at a time to import, that's still an issue.
What's the application? If this is something common, you might find a solution is available off the shelf.
Josh
>Hi all, > [quoted text clipped - 13 lines] >Best Regards, >Colin Ramsay info@colinramsay.co.uk - 13 Jul 2008 22:06 GMT > I'll just echo the good advice you've already been given, but add a > few more points. Just FYI, something like a million rows is small [quoted text clipped - 45 lines] > >Best Regards, > >Colin Ramsay Thanks for the further posts guys.
The application is being designed from scratch, so we can design it with efficiency in mind. The rows are going to contain very little data, but will be coming in fairly regularly over the day. JXStern, you're probably right that most common reports could be pre-processed, but the option will need to be there to compare trends over any given period.
|
|
|