SQL Server Forum / Programming / SQL / July 2008
How to tag 500 rows of a table without a unique key
|
|
Thread rating:  |
Ian Boyd - 03 Jul 2008 22:14 GMT i want to flag 500 rows in a staging table as the rows i'm going to process right now. Once they've been processed they will be deleted, and another 500 will get marked. i don't really care which 500 rows, just 500 of them. If the rows has a unique key, i would do an UPDATE ImportStaging SET ImportFlag = 1 WHERE UniqueKeyColumnName IN ( SELECT TOP 500 UniqueKeyColumnName FROM ImoprtStaging)
But since there's no unique key column, i cannot do that. Also, i don't want to use the depricated SET ROWCOUNT = 500
(unless you want to put $100,000 into escrow to cover the costs of updating the import process to not use ROWCOUNT when it finally goes away)
The staging table DDL is as follows. Note the last column "ImportFlag" which i want to set to "1" for 500 arbitrary rows:
CREATE TABLE ImportStaging ( [a] [uniqueidentifier] NOT NULL , [b] [uniqueidentifier] NULL , [c] [uniqueidentifier] NULL , [d [tinyint] NOT NULL CONSTRAINT [DF_ImportStaging_d] DEFAULT (0), [e] [int] NULL , [f] [varchar] (50) NULL , [g] [varchar] (100) NOT NULL , [h] [varchar] (100) NOT NULL , [i] [varchar] (250) NULL , [j] [varchar] (100) NULL , [k] [varchar] (50) NULL , [l] [varchar] (50) NULL , [m] [varchar] (100) NULL , [n] [varchar] (50) NULL , [o] [datetime] NULL , [p] [varchar] (200) NULL , [q] [int] NULL , [r] [int] NULL , [s] [varchar] (200) NULL , [ImportFlag] [tinyint] NULL CONSTRAINT [DF_ImportStaging_ImportFlag] DEFAULT (0) )
CREATE INDEX [IX_ImportStaging_e] ON [dbo].[ImportStaging]([e], [f])
i can also say that the column tuple (e,f) _should_ be unique. My first thought was:
UPDATE ImportStaging SET ImportFlag = 1 WHERE EXISTS ( SELECT TOP 500 * FROM ImportStaging i2 WHERE i2.e = ImportStaging.e AND i2.f = ImportStaging.f)
...but seems like it won't work - i have a feeling that the exists is satisfied when i don't want it to be (i.e. always)
So i'm fiddling with something like:
UPDATE ImportStaging SET ImportFlag = 1 FROM ImportStaging INNER JOIN ( SELECT TOP 500 e,f FROM ImportStaging) dervivedTable ON ImportStaging.e = derivedTable.e AND ImportStaging.f = derivedTable.f
Does that seem reasonable? (The update statement...not the scenario i've set up).
Or perhaps someone can think of an IN or EXISTS version, rather than a join.
Adi - 03 Jul 2008 22:43 GMT > i want to flag 500 rows in a staging table as the rows i'm going to process > right now. Once they've been processed they will be deleted, and another 500 [quoted text clipped - 68 lines] > > Or perhaps someone can think of an IN or EXISTS version, rather than a join. Why don't you have any primary key and why don't you use normal names for your columns? Also you didn't specify the version that you are using. If you are using SQL Server 2000, then just use the set rowcount statement. If you use a newer version, you can use UPDATE TOP (500) instead.
Adi
Ian Boyd - 04 Jul 2008 00:45 GMT > Why don't you have any primary key a) there is no natural key b) It's not my table
> and why don't you use normal names a) column names obfuscated to protect the innocent b) column names obfuscated to avoid people focusing on things are not the question
> Also you didn't specify the version that you are using. i was hoping for something fairly standard on SQL Server, that would be usable for anyone stumbling across this post 15 years from now.
But let's assume it has to run on 2000 and 2005 - since it does.
Roy Harvey (SQL Server MVP) - 03 Jul 2008 23:01 GMT You are already close.
UPDATE TOP (500) ImportStaging SET ImportFlag = 1
Roy Harvey Beacon Falls, CT
>i want to flag 500 rows in a staging table as the rows i'm going to process >right now. Once they've been processed they will be deleted, and another 500 [quoted text clipped - 68 lines] > >Or perhaps someone can think of an IN or EXISTS version, rather than a join. Ian Boyd - 04 Jul 2008 00:49 GMT > You are already close. > > UPDATE TOP (500) ImportStaging > SET ImportFlag = 1 Doesn't work - in 2000?
Roy Harvey (SQL Server MVP) - 04 Jul 2008 01:26 GMT >> You are already close. >> >> UPDATE TOP (500) ImportStaging >> SET ImportFlag = 1 > >Doesn't work - in 2000? Nope. But of course you had not specified 2000. It will probably still work in 15 years though, IF SQL Server is still the tool.
If the rows are not unique, and you have to process exactly 500 at a time, then you are stuck with SET ROWCOUNT. If the rows are almost unique and you can settle for being close to 500 each time you might be able to do:
UPDATE ImportStaging SET ImportFlag = 1 FROM (SELECT TOP (500) * FROM ImportStaging as X) WHERE ImportStaging.col01 = X.col01 AND ImportStaging.col02 = X.col02 AND ImportStaging.col03 = X.col03 AND ImportStaging.col04 = X.col04 ... AND ImportStaging.col99 = X.col99
where every column in the table is compared.
Roy Harvey Beacon Falls, CT
Plamen Ratchev - 03 Jul 2008 23:04 GMT On SQL Server 2005 you can use the MS specific UPDATE TOP syntax:
UPDATE TOP(500) ImportStaging SET ImportFlag = 1;
Also, SET ROWCOUNT will still work in SQL Server 2008 and will be depreciated in future version: http://technet.microsoft.com/en-us/library/ms188774(SQL.100).aspx
HTH,
Plamen Ratchev http://www.SQLStudio.com
Ian Boyd - 04 Jul 2008 00:52 GMT > UPDATE TOP(500) ImportStaging > SET ImportFlag = 1; > > Also, SET ROWCOUNT will still work in SQL Server 2008 and will be > depreciated in future version: > http://technet.microsoft.com/en-us/library/ms188774(SQL.100).aspx Depricated means, "please don't use it anymore, and certainly don't write new stuff with it". Not "go ahead keep on using it until we finally pull the rug out from under you."
Put it another way What Would Celko Do....
...after he recovered from his anurism because there's no natural key, primary key, foreign keys, database integrity, nor is it even a table, and there are beams from outer space.
Plamen Ratchev - 04 Jul 2008 03:38 GMT > Depricated means, "please don't use it anymore, and certainly don't write > new stuff with it". Not "go ahead keep on using it until we finally pull > the rug out from under you." You noted in your post to Roy you use SQL Server 2000. As the product mainstream support ended April 8, 2008, does it mean you in fact work with depreciated product? http://support.microsoft.com/lifecycle/search/?sort=PN&alpha=sql&x=9&y=10
In my opinion refactoring of code is a major step in upgrades between product versions. As products mature new features are introduced and their use can significantly improve an application. Just look at the ranking functions in SQL Server 2005.
> Put it another way What Would Celko Do.... > > ...after he recovered from his anurism because there's no natural key, > primary key, foreign keys, database integrity, nor is it even a table, and > there are beams from outer space. You forgot to note the use of assembly language flags...
:) Plamen Ratchev http://www.SQLStudio.com
TheSQLGuru - 04 Jul 2008 02:11 GMT Copy all the data into a new table with an identity on it and use that. Or get past your fear of deprecation (or plan for it) and use set rowcount.
 Signature Kevin G. Boles Indicium Resources, Inc. SQL Server MVP kgboles a earthlink dt net
>i want to flag 500 rows in a staging table as the rows i'm going to process >right now. Once they've been processed they will be deleted, and another [quoted text clipped - 69 lines] > Or perhaps someone can think of an IN or EXISTS version, rather than a > join. Erland Sommarskog - 04 Jul 2008 09:41 GMT > i want to flag 500 rows in a staging table as the rows i'm going to > process right now. Once they've been processed they will be deleted, and [quoted text clipped - 4 lines] > SELECT TOP 500 UniqueKeyColumnName > FROM ImoprtStaging) Assuming that this table has quite some size, it's going to be painful, unless you have some good index strategy. My experience is that for batching to be meaningful, you need to align with the clustered index. It seems from your post, that this table does not have a clustered index, and, again assuming that that the table is huge, it could take some time do build one.
In any case, my main point is that it is a bad idea to stick with a fixed number. You have an index over e which is an integer column. Examine this column a bit, and then handle the rows as intervals over this column. Copy the rows to be handled to a temp table (rather than using the import flag) using a condition like WHERE e >= @start AND e < @start + @range
delete the rows, and then increase @start with @range. You may handle 1000 rows in some batches, and other batches will only have a 100 rows, but that should not be a big deal. If you really want to have a fixed batch size, adding an IDENTITY column is probably the best option, although for the existing table that may be expensive. If the column is there when the table is loaded it's less of an issue. If you can have a clustered index on the IDENTITY column, the increased loading time, may be compensated by more effective batching.
 Signature Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
Books Online for SQL Server 2005 at http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx Books Online for SQL Server 2000 at http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx
Ian Boyd - 04 Jul 2008 14:16 GMT > Assuming that this table has quite some size, it's going to be painful, > unless you have some good index strategy. So the generally accepted answer is that there's no nice way to do it as the DDL is. Which is fine, question answered.
Now, in order to drive Celko nuts, i'll add a meaningless clustered identity primary key, and do my ranks based on that. It will be a completely meaningless artificial primary key, and i'll have no data integrity whatsoever - and in fact it won't even be a table.
Stuart Ainsworth - 04 Jul 2008 14:58 GMT > Now, in order to drive Celko nuts, i'll add a meaningless clustered identity > primary key, and do my ranks based on that. It will be a completely > meaningless artificial primary key, and i'll have no data integrity > whatsoever - and in fact it won't even be a table. But according to your posts, you don't have any data integrity now, nor is it a table. When you're already in the pool, why are you worried about getting wet?
Stu
Gert-Jan Strik - 04 Jul 2008 20:42 GMT > > Assuming that this table has quite some size, it's going to be painful, > > unless you have some good index strategy. > > So the generally accepted answer is that there's no nice way to do it as the > DDL is. Which is fine, question answered. [snip]
Yes there is, even with your unnormalized DDL, and others have posted such solution.
But not if you require it to run on at least 4 generations of SQL Server (2000, 2005, 2008 and the first version following 2008). How realistic is that requirement? And are you willing to sacrifies current or future performance gains for that requirement?
By the way, another (very inefficient) solution is to use a cursor to loop through your table data, and mark 500 rows one by one.
 Signature Gert-Jan SQL Server MVP
|
|
|