Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
Home
Discussion Groups
DB Engine
SQL ServerMSDESQL Server CE
Services
Analysis (Data Mining)Analysis (OLAP)DTSIntegration ServicesNotification ServicesReporting Services
Programming
CLRConnectivitySQLXML
Other Technologies
ClusteringEnglish QueryFull-Text SearchReplicationService Broker
General
Data WarehousingPerformanceSecuritySetupSQL Server ToolsOther SQL Server Topics
DirectoryUser Groups
Related Topics
MS AccessOther DB ProductsMS Server Products.NET DevelopmentVB DevelopmentJava DevelopmentMore Topics ...

SQL Server Forum / Programming / SQL / July 2008

Tip: Looking for answers? Try searching our database.

How to tag 500 rows of a table without a unique key

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
Ian Boyd - 03 Jul 2008 22:14 GMT
i want to flag 500 rows in a staging table as the rows i'm going to process
right now. Once they've been processed they will be deleted, and another 500
will get marked. i don't really care which 500 rows, just 500 of them.  If
the rows has a unique key, i would do an
   UPDATE ImportStaging SET ImportFlag = 1
   WHERE UniqueKeyColumnName IN (
           SELECT TOP 500 UniqueKeyColumnName
           FROM ImoprtStaging)

But since there's no unique key column, i cannot do that.  Also, i don't
want to use the depricated
   SET ROWCOUNT = 500

(unless you want to put $100,000 into escrow to cover the costs of updating
the import process to not use ROWCOUNT when it finally goes away)

The staging table DDL is as follows. Note the last column "ImportFlag" which
i want to set to "1" for 500 arbitrary rows:

CREATE TABLE ImportStaging (
[a] [uniqueidentifier] NOT NULL ,
[b] [uniqueidentifier] NULL ,
[c] [uniqueidentifier] NULL ,
[d [tinyint] NOT NULL CONSTRAINT [DF_ImportStaging_d] DEFAULT (0),
[e] [int] NULL ,
[f] [varchar] (50) NULL ,
[g] [varchar] (100)  NOT NULL ,
[h] [varchar] (100) NOT NULL ,
[i] [varchar] (250) NULL ,
[j] [varchar] (100) NULL ,
[k] [varchar] (50) NULL ,
[l] [varchar] (50) NULL ,
[m] [varchar] (100) NULL ,
[n] [varchar] (50) NULL ,
[o] [datetime] NULL ,
[p] [varchar] (200) NULL ,
[q] [int] NULL ,
[r] [int] NULL ,
[s] [varchar] (200)  NULL ,
[ImportFlag] [tinyint] NULL CONSTRAINT [DF_ImportStaging_ImportFlag]
DEFAULT (0)
)

CREATE  INDEX [IX_ImportStaging_e] ON [dbo].[ImportStaging]([e], [f])

i can also say that the column tuple (e,f) _should_ be unique. My first
thought was:

UPDATE ImportStaging SET ImportFlag = 1
WHERE EXISTS (
   SELECT TOP 500 *
   FROM ImportStaging i2
   WHERE i2.e = ImportStaging.e
   AND i2.f = ImportStaging.f)

...but seems like it won't work - i have a feeling that the exists is
satisfied when i don't want it to be (i.e. always)

So i'm fiddling with something like:

UPDATE ImportStaging SET ImportFlag = 1
FROM ImportStaging
   INNER JOIN (
       SELECT TOP 500 e,f
       FROM ImportStaging) dervivedTable
   ON ImportStaging.e = derivedTable.e
   AND ImportStaging.f = derivedTable.f

Does that seem reasonable? (The update statement...not the scenario i've set
up).

Or perhaps someone can think of an IN or EXISTS version, rather than a join.
Adi - 03 Jul 2008 22:43 GMT
> i want to flag 500 rows in a staging table as the rows i'm going to process
> right now. Once they've been processed they will be deleted, and another 500
[quoted text clipped - 68 lines]
>
> Or perhaps someone can think of an IN or EXISTS version, rather than a join.

Why don't you have any primary key and why don't you use normal names
for your columns?  Also you didn't specify the version that you are
using.  If you are using SQL Server 2000, then just use the set
rowcount statement.  If you use a newer version, you can use UPDATE
TOP (500) instead.

Adi
Ian Boyd - 04 Jul 2008 00:45 GMT
> Why don't you have any primary key

a) there is no natural key
b) It's not my table

> and why don't you use normal names
a) column names obfuscated to protect the innocent
b) column names obfuscated to avoid people focusing on things are not the
question

> Also you didn't specify the version that you are using.

i was hoping for something fairly standard on SQL Server, that would be
usable for anyone stumbling across this post 15 years from now.

But let's assume it has to run on 2000 and 2005 - since it does.
Roy Harvey (SQL Server MVP) - 03 Jul 2008 23:01 GMT
You are already close.

UPDATE TOP (500) ImportStaging
  SET ImportFlag = 1

Roy Harvey
Beacon Falls, CT

>i want to flag 500 rows in a staging table as the rows i'm going to process
>right now. Once they've been processed they will be deleted, and another 500
[quoted text clipped - 68 lines]
>
>Or perhaps someone can think of an IN or EXISTS version, rather than a join.
Ian Boyd - 04 Jul 2008 00:49 GMT
> You are already close.
>
> UPDATE TOP (500) ImportStaging
>   SET ImportFlag = 1

Doesn't work - in 2000?
Roy Harvey (SQL Server MVP) - 04 Jul 2008 01:26 GMT
>> You are already close.
>>
>> UPDATE TOP (500) ImportStaging
>>   SET ImportFlag = 1
>
>Doesn't work - in 2000?

Nope.  But of course you had not specified 2000.  It will probably
still work in 15 years though, IF SQL Server is still the tool.

If the rows are not unique, and you have to process exactly 500 at a
time, then you are stuck with SET ROWCOUNT.  If the rows are almost
unique and you can settle for being close to 500 each time you might
be able to do:

UPDATE ImportStaging
  SET ImportFlag = 1
 FROM (SELECT TOP (500) *
         FROM ImportStaging as X)
WHERE ImportStaging.col01 = X.col01
  AND ImportStaging.col02 = X.col02
  AND ImportStaging.col03 = X.col03
  AND ImportStaging.col04 = X.col04
  ...
  AND ImportStaging.col99 = X.col99

where every column in the table is compared.

Roy Harvey
Beacon Falls, CT
Plamen Ratchev - 03 Jul 2008 23:04 GMT
On SQL Server 2005 you can use the MS specific UPDATE TOP syntax:

UPDATE TOP(500) ImportStaging
SET ImportFlag = 1;

Also, SET ROWCOUNT will still work in SQL Server 2008 and will be
depreciated in future version:
http://technet.microsoft.com/en-us/library/ms188774(SQL.100).aspx

HTH,

Plamen Ratchev
http://www.SQLStudio.com
Ian Boyd - 04 Jul 2008 00:52 GMT
> UPDATE TOP(500) ImportStaging
> SET ImportFlag = 1;
>
> Also, SET ROWCOUNT will still work in SQL Server 2008 and will be
> depreciated in future version:
> http://technet.microsoft.com/en-us/library/ms188774(SQL.100).aspx

Depricated means, "please don't use it anymore, and certainly don't write
new stuff with it". Not "go ahead keep on using it until we finally pull the
rug out from under you."

Put it another way What Would Celko Do....

...after he recovered from his anurism because there's no natural key,
primary key, foreign keys, database integrity, nor is it even a table, and
there are beams from outer space.
Plamen Ratchev - 04 Jul 2008 03:38 GMT
> Depricated means, "please don't use it anymore, and certainly don't write
> new stuff with it". Not "go ahead keep on using it until we finally pull
> the rug out from under you."

You noted in your post to Roy you use SQL Server 2000. As the product
mainstream support ended April 8, 2008, does it mean you in fact work with
depreciated product?
http://support.microsoft.com/lifecycle/search/?sort=PN&alpha=sql&x=9&y=10

In my opinion refactoring of code is a major step in upgrades between
product versions. As products mature new features are introduced and their
use can significantly improve an application. Just look at the ranking
functions in SQL Server 2005.

> Put it another way What Would Celko Do....
>
> ...after he recovered from his anurism because there's no natural key,
> primary key, foreign keys, database integrity, nor is it even a table, and
> there are beams from outer space.

You forgot to note the use of assembly language flags...

:)

Plamen Ratchev
http://www.SQLStudio.com
TheSQLGuru - 04 Jul 2008 02:11 GMT
Copy all the data into a new table with an identity on it and use that.  Or
get past your fear of deprecation (or plan for it) and use set rowcount.

Signature

Kevin G. Boles
Indicium Resources, Inc.
SQL Server MVP
kgboles a earthlink dt net

>i want to flag 500 rows in a staging table as the rows i'm going to process
>right now. Once they've been processed they will be deleted, and another
[quoted text clipped - 69 lines]
> Or perhaps someone can think of an IN or EXISTS version, rather than a
> join.
Erland Sommarskog - 04 Jul 2008 09:41 GMT
> i want to flag 500 rows in a staging table as the rows i'm going to
> process right now. Once they've been processed they will be deleted, and
[quoted text clipped - 4 lines]
>             SELECT TOP 500 UniqueKeyColumnName
>             FROM ImoprtStaging)

Assuming that this table has quite some size, it's going to be painful,
unless you have some good index strategy. My experience is that for batching
to be meaningful, you need to align with the clustered index. It
seems from your post, that this table does not have a clustered index,
and, again assuming that that the table is huge, it could take some
time do build one.

In any case, my main point is that it is a bad idea to stick with a
fixed number. You have an index over e which is an integer column.
Examine this column a bit, and then handle the rows as intervals over
this column. Copy the rows to be handled to a temp table (rather than
using the import flag) using a condition like
 
  WHERE e >= @start AND e < @start + @range

delete the rows, and then increase @start with @range. You may handle
1000 rows in some batches, and other batches will only have a 100 rows,
but that should not be a big deal.

If you really want to have a fixed batch size, adding an IDENTITY column
is probably the best option, although for the existing table that may be
expensive. If the column is there when the table is loaded it's less of
an issue. If you can have a clustered index on the IDENTITY column, the
increased loading time, may be compensated by more effective batching.

Signature

Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx

Ian Boyd - 04 Jul 2008 14:16 GMT
> Assuming that this table has quite some size, it's going to be painful,
> unless you have some good index strategy.

So the generally accepted answer is that there's no nice way to do it as the
DDL is. Which is fine, question answered.

Now, in order to drive Celko nuts, i'll add a meaningless clustered identity
primary key, and do my ranks based on that. It will be a completely
meaningless artificial primary key, and i'll have no data integrity
whatsoever - and in fact it won't even be a table.
Stuart Ainsworth - 04 Jul 2008 14:58 GMT
> Now, in order to drive Celko nuts, i'll add a meaningless clustered identity
> primary key, and do my ranks based on that. It will be a completely
> meaningless artificial primary key, and i'll have no data integrity
> whatsoever - and in fact it won't even be a table.

But according to your posts, you don't have any data integrity now,
nor is it a table.  When you're already in the pool, why are you
worried about getting wet?

Stu
Gert-Jan Strik - 04 Jul 2008 20:42 GMT
> > Assuming that this table has quite some size, it's going to be painful,
> > unless you have some good index strategy.
>
> So the generally accepted answer is that there's no nice way to do it as the
> DDL is. Which is fine, question answered.

[snip]

Yes there is, even with your unnormalized DDL, and others have posted
such solution.

But not if you require it to run on at least 4 generations of SQL Server
(2000, 2005, 2008 and the first version following 2008). How realistic
is that requirement? And are you willing to sacrifies current or future
performance gains for that requirement?

By the way, another (very inefficient) solution is to use a cursor to
loop through your table data, and mark 500 rows one by one.

Signature

Gert-Jan
SQL Server MVP

 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.