SQL Server Forum / DB Engine / SQL Server / July 2008
Increasing chackpoint duration - Expert help appreciated
|
|
Thread rating:  |
CD - 18 Jul 2008 07:53 GMT Greetings,
We are running a test scenario with the purpose of "filling up" our database. The client application that drives the test has 4 threads which simply create rows in a particular table in the database. The load on the server is constant at about 20 transactions/sec and each transaction is designed to have the same amount of data.
In the beginning of the tests, I noticed very infrequent and short checkpoints. For instance, up to 100,000 rows, the checkpoints happened about every 3 minutes or so and lasted for about 15 seconds. Currently, I have 300,000 rows and the checkpoints happen every minute and last about 1 minute. Yes, the system spends as much time checkpointing as in normal operation. This degrades the system performance dramatically.
I also noticed that the rate of checkpoint pages/second was halved as the number of rows increased from 100K to 300K and that the SQL Server memory has stabilized at 1.6GB.
Considering that the load on the server is constant (as described above): - I do not understand the rational behind this behavior, and - what can do to mitigate this problem?
Any suggestion or comment would be greatly appreciated.
Cheers CD
Andrew J. Kelly - 18 Jul 2008 14:23 GMT Is this 2005? First off it is blatantly obvious you do not have a proper hardware or database configuration to properly support that level of transactions if the checkpoints take 15 seconds to finish. You will need to add the proper disks and move the log files onto their own Raid 1 or Raid 10 array. But in any case a checkpoint happens for several reasons but usually either when you do a log backup or you have more transactions in the log than the recovery interval says it will take in seconds to redo. Please read up in BooksOnLine under checkpoints and Recovery interval for details. If you try to throw more dirty pages at the checkpoint than it can handle efficiently it will start to throttle back some and you will have checkpoints more often as well. The memory usage has stabilized at 1.6 because you either don't have more than 2GB or you do not have it configured properly. Bottom line is that you will need to increase your configuration of hardware and SQL Server to accommodate the required load.
 Signature Andrew J. Kelly SQL MVP Solid Quality Mentors
> Greetings, > [quoted text clipped - 25 lines] > Cheers > CD Linchi Shea - 18 Jul 2008 14:44 GMT In addition to Andrew Kelly's comments, I'd like to add that the checkpoint performance is heavily influenced by how the checkpoint I/Os are issued. More specifically, checkpoints are much more efficient when done in large I/Os than in small I/Os. So I'd check Disk Bytes/Write for the data dirve(s) to what that counter value may be. If it's a small value (e.g. close to 8K), then even if you have a good disk subsystem, your checkpoint operations won't be efficient.
See the following for more info: http://sqlblog.com/blogs/linchi_shea/archive/2008/01/19/sql-server-checkpoint-i- o-behavior.aspx
Linchi
> Greetings, > [quoted text clipped - 25 lines] > Cheers > CD CD - 18 Jul 2008 16:30 GMT Hello All,
Thank you for your answers.
Is there any reason why the checkpoint durations might be increasing with time, despite the load on the system to be the same?
My test application attempts to create rows at a constant rate. However, the checkpoint times almost tripled now in comparison to those times in the beginning of the run.
Kind regards Cassiano
On Jul 18, 6:44 am, Linchi Shea <LinchiS...@discussions.microsoft.com> wrote:
> In addition to Andrew Kelly's comments, I'd like to add that the checkpoint > performance is heavily influenced by how the checkpoint I/Os are issued. More [quoted text clipped - 26 lines] > > I also noticed that the rate of checkpoint pages/second was halved as > > the number of rows increased from 100K to 300K and that the SQL Server
> > memory has stabilized at 1.6GB. > [quoted text clipped - 9 lines] > > - Show quoted text - Andrew J. Kelly - 18 Jul 2008 20:07 GMT It's hard to say without seeing all of the evidence. Initially you had lots of available memory and less going on from an overhead perspective. After you fill up the memory SQL Server needs to manage that and that takes CPU. You also are probably doing adhoc queries which bloat the procedure cache and can add even more CPU usage. So now you are probably spending more time managing overall operations. Did you capture perfmon counters for things like CPU, Memory, Batch Requests, IO etc?
 Signature Andrew J. Kelly SQL MVP Solid Quality Mentors
Hello All,
Thank you for your answers.
Is there any reason why the checkpoint durations might be increasing with time, despite the load on the system to be the same?
My test application attempts to create rows at a constant rate. However, the checkpoint times almost tripled now in comparison to those times in the beginning of the run.
Kind regards Cassiano
On Jul 18, 6:44 am, Linchi Shea <LinchiS...@discussions.microsoft.com> wrote:
> In addition to Andrew Kelly's comments, I'd like to add that the > checkpoint [quoted text clipped - 31 lines] > > I also noticed that the rate of checkpoint pages/second was halved as > > the number of rows increased from 100K to 300K and that the SQL Server
> > memory has stabilized at 1.6GB. > [quoted text clipped - 9 lines] > > - Show quoted text - CD - 19 Jul 2008 06:02 GMT On Jul 18, 12:07 pm, "Andrew J. Kelly" <sqlmvpnooos...@shadhawk.com> wrote:
> It's hard to say without seeing all of the evidence. Initially you had lots > of available memory and less going on from an overhead perspective. After [quoted text clipped - 3 lines] > managing overall operations. Did you capture perfmon counters for things > like CPU, Memory, Batch Requests, IO etc? Yes, I captured various perfmon counters. Anything you would suggest to look in particular?
This, however, is a test system that runs in a controlled envirinment where the clients maintain the same load on the server througout the tests. Therefore my puzzlement of why the checkpoints get longer and longer.
Thanks CD
> -- > Andrew J. Kelly SQL MVP [quoted text clipped - 69 lines] > > - Show quoted text - Dan Guzman - 19 Jul 2008 04:06 GMT > In the beginning of the tests, I noticed very infrequent and short > checkpoints. For instance, up to 100,000 rows, the checkpoints [quoted text clipped - 3 lines] > checkpointing as in normal operation. This degrades the system > performance dramatically. Are transaction log files on the same physical disk as data files? Checkpoints can seriously drag down transactional performance when both are on the same drive.
> I also noticed that the rate of checkpoint pages/second was halved as > the number of rows increased from 100K to 300K and that the SQL Server > memory has stabilized at 1.6GB. This might be a symptom of a lot of random I/O to a growing table. Are you perhaps using a guid or other random value as the clustered index key?
 Signature Hope this helps.
Dan Guzman SQL Server MVP http://weblogs.sqlteam.com/dang/
> Greetings, > [quoted text clipped - 25 lines] > Cheers > CD CD - 19 Jul 2008 05:58 GMT Hello Dan,
The disks are distributed as follows: -DISK 1: OS and SQL Server -DISK 2: master, tempdb, and their respective logs -DISK 3, 4: application db -DISK 5: application transaction log
All disks are SAS 15Krpm
Could you please comment on this configuration? Do you know if I would gain in performance by creating my indexes with the SORT_IN _TEMPDB option?
With respect to your second point, you are correct. This is a table growing with a lot of indices keyed by guids. Note that this is not a production scenario but a test system that we put together to see whether we could satisfy the load requirements with a smaller IO infrastructure.
This test is a "fill-up"scenario, where we have multiple clients creating rows on a specific table. Therefore, the large number of changes, and consequently, checkpoints. I still do not understand, however, why the checkpoints get longer as we increase the size of the inventory, if the clients are designed to sustain the load during the whole test. There is no ad-hoc query in this scenario: just creates. I wonder if this has todo with increasing indexes sizes, since we have a number of indices in the tables that we are populating (the indexes pages are also cached and are also flushed during a checkpoint). Does this make sense?
Also, would I experience any gains by increasing the memory of the server and "playing around" with the recovery interval?
Kind regards CD
On Jul 18, 8:06 pm, "Dan Guzman" <guzma...@nospam- online.sbcglobal.net> wrote:
> > In the beginning of the tests, I noticed very infrequent and short > > checkpoints. For instance, up to 100,000 rows, the checkpoints [quoted text clipped - 52 lines] > > - Show quoted text - Dan Guzman - 19 Jul 2008 13:48 GMT > The disks are distributed as follows: > -DISK 1: OS and SQL Server [quoted text clipped - 5 lines] > > Could you please comment on this configuration? I think this is a proper basic disk configuration with local disks (no SAN or array) if you don't need fault tolerance. You might consider moving the app transaction log to DISK 1 and creating another app db data file on DISK 5. This will probably mitigate your data file I/O performance problem by spreading the I/O workload over 3 spindles instead of 2. There shouldn't be any OS paging on a dedicated SQL box so sharing the transaction log with the OS and binaries should be ok.
> Do you know if I would > gain in performance by creating my indexes with the SORT_IN _TEMPDB > option? SORT_IN _TEMPDB option might very well improve index create performance because you have tempdb on a separate drive. By using tempdb for the intermediate sort results instead of the file group, I/O is typically serial and more efficient..
> With respect to your second point, you are correct. This is a table > growing with a lot of indices keyed by guids. I thought this might be the case because your performance profile is the classic example of how random I/O can negatively affect I/O performance. Keep in mind that the slowest part of a disk I/O is head movement. The I/O is concentrated on a small contiguous area of the disk(s) with the table is small so the heads need to move little, if at all, during reads and writes. However, the heads need to move more often and further as more data is inserted. This head movement increases I/O response (pmon Avg. Disk/sec Transfer) so it takes more time to do the same amount of disk work as the table grows.
> Also, would I experience any gains by increasing the memory of the > server and "playing around" > with the recovery interval? You can mitigate the effects with more memory, cache and playing around with the recovery interval. However, at the end if the day, you'll still need to write to disk. IMHO, the best hardware approach is to add more data disks to handle the anticipated workload.
I think you can probably achieve your performance objective on the existing hardware with some application/index tuning. An incremental clustered index key (e.g. NEWSEQUENTIALID or IDENTITY) will provide more optimal and consistent insert performance. Ideally, the non-clustered indexes should be incremental as well but that's of course not always feasible.
FYI, I once worked on an ETL application that used a GUID clustered primary key when importing files into tables that grew into millions of rows. Performance improved by an order of magnitude after I changed to a composite primary key of file id (a sequential integer) and the record id guid. The application could then scale to handle billions of rows.
 Signature Hope this helps.
Dan Guzman SQL Server MVP http://weblogs.sqlteam.com/dang/
CD - 22 Jul 2008 07:54 GMT Hello Dan,
Thank you for your reply.
I was intrigued by your last point:
> FYI, I once worked on an ETL application that used a GUID clustered primary > key when importing files into tables that grew into millions of rows. > Performance improved by an order of magnitude after I changed to a composite > primary key of file id (a sequential integer) and the record id guid. The > application could then scale to handle billions of rows. specially the capability of supporting billions of rows. Could you please clarify in the following example?
Consider the table: CREATE TABLE [A1]( [id] [uniqueidentifier] NOT NULL, [name] [nvarchar](50) NULL, CONSTRAINT [PK_A] PRIMARY KEY CLUSTERED ( [id] ASC)
Are you suggesting adding another column to the table like this?
Option 2: CREATE TABLE [dbo].[A2]( [seq] [int] IDENTITY(1,1) NOT NULL, [id] [uniqueidentifier] NOT NULL, [name] [nvarchar](50) NULL, CONSTRAINT [PK_A] PRIMARY KEY CLUSTERED ([seq] ASC, [id] ASC)
I understand that using this new composite index (in this order) would result in a more "localized" modification of the index. However, in my case, I would need to add another nonclustered index in the id, since many operations on the service that uses this table are parameterized by the id. Also, I would need to add another non- clustered index on the name column.
As a result we would have two indices on table A1 - Primary key: on id - Nonclustered: on name We havve 3 indices on table A2 - Primary key: on (seq, id) - Nonclustered: on id - Nonclustered: on name
As a result of a insert, two indices are updated in A1, while three are updated in A2 Therefore, I do not understand how A2 can be more efficient. Is your reasoning valid only if I could add the composite key without the need to add the non- clustered index in the id?
Am I missing something?
Kind regards CD
|
|
|