Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
Home
Discussion Groups
DB Engine
SQL ServerMSDESQL Server CE
Services
Analysis (Data Mining)Analysis (OLAP)DTSIntegration ServicesNotification ServicesReporting Services
Programming
CLRConnectivitySQLXML
Other Technologies
ClusteringEnglish QueryFull-Text SearchReplicationService Broker
General
Data WarehousingPerformanceSecuritySetupSQL Server ToolsOther SQL Server Topics
DirectoryUser Groups
Related Topics
MS AccessOther DB ProductsMS Server Products.NET DevelopmentVB DevelopmentJava DevelopmentMore Topics ...

SQL Server Forum / DB Engine / SQL Server / March 2008

Tip: Looking for answers? Try searching our database.

Varchars/Chars vs Int/BigInt as keys

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
ryan  putman - 03 Mar 2008 20:21 GMT
This conversation came up and I was wondering if anyone had some
definitive proof one way or another....

If you have a key column for a table and that column will be used as a
reference/foreign key to other tables often and frequently but also
could reach values well into the millions at what point (if ever) do
you opt for a char/varchar key as opposed to an int/bigint?

If the choice does become char/varchar at some point, are there
particular advantages to using char vs. varchar?  In the system under
question, the id values are currently between 7 and 11 characters
long......so no really small values or any too large neither.

Thanks in Advance
Ryan Putman
Andrew J. Kelly - 04 Mar 2008 01:31 GMT
The decision to use char vs. int should depend on what the type of data is.
If the values are always going to be numbers then why make it a char or
varchar when an INT is what it really is?  INT's will always be more
efficient than a 12 character datatype since the INT will only take up 4
bytes and the other 12.

Signature

Andrew J. Kelly    SQL MVP
Solid Quality Mentors

> This conversation came up and I was wondering if anyone had some
> definitive proof one way or another....
[quoted text clipped - 11 lines]
> Thanks in Advance
> Ryan Putman
ryan  putman - 04 Mar 2008 13:26 GMT
On Mar 3, 8:31 pm, "Andrew J. Kelly" <sqlmvpnooos...@shadhawk.com>
wrote:
> The decision to use char vs. int should depend on what the type of data is.
> If the values are always going to be numbers then why make it a char or
[quoted text clipped - 4 lines]
> --
> Andrew J. Kelly    SQL MVP

hmm...thanks for the reply but apparently I didn't state my question
clearly enough.

The column in question is just some key column.  The question is
totally hypothetical, not analytical.  If I have a table that could
eventually reach well into the millions in rows, should I use a
standard integer (or bigint) identity column as a primary key or
should I use a generated var/char column.

The integer should provide faster linking and searching initially but
is there a point where have a generated varchar key like A283CD is
better than an integer ID of 2,828,828?
Dan Guzman - 04 Mar 2008 13:56 GMT
> hmm...thanks for the reply but apparently I didn't state my question
> clearly enough.
>
> The column in question is just some key column.

I think Andrew understood your question but you may not have understood his
answer.  The key datatype is determined by the underlying data.  For
example, if you have a numeric CustomerID, then an integer type is
appropriate.  A char (or varchar) is appropriate for alphanumeric
CustomerID.

> The integer should provide faster linking and searching initially but
> is there a point where have a generated varchar key like A283CD is
> better than an integer ID of 2,828,828?

Since you mention "generated", I think you are specifically referring to a
surrogate key.  A surrogate key is an artificial value used as an
alternative to the natural key.  For a surrogate key data type, integers are
most commonly used in SQL Server because values are easily generated using
IDENTITY and provide good performance an low storage requirements.
Uniqueidentifiers are used in cases where the value needs to be globally
unique.  Character values are used only when business requirements dictate.

When you use surrogate keys, it is important that you also have a constraint
on the natural key to ensure uniqueness and prevent duplicate rows.

Signature

Hope this helps.

Dan Guzman
SQL Server MVP
http://weblogs.sqlteam.com/dang/

> On Mar 3, 8:31 pm, "Andrew J. Kelly" <sqlmvpnooos...@shadhawk.com>
> wrote:
[quoted text clipped - 20 lines]
> is there a point where have a generated varchar key like A283CD is
> better than an integer ID of 2,828,828?
ryan  putman - 04 Mar 2008 19:51 GMT
On Mar 4, 8:56 am, "Dan Guzman" <guzma...@nospam-online.sbcglobal.net>
wrote:

> I think Andrew understood your question but you may not have understood his
> answer.  The key datatype is determined by the underlying data.  For
> example, if you have a numeric CustomerID, then an integer type is
> appropriate.  A char (or varchar) is appropriate for alphanumeric
> CustomerID.

There is no underlying data.  As I said, this is mainly a hypothetical
question.

A colleague of mine made the point that if I was going to have a table
with a very large number of rows (into the millions) that a char-based
generated ID value would be more efficient than an Integer
value......which I doubted because INTs have always been faster in my
experience.

There is no data.  Just the question.....is there ever a point where
INTs become less efficient because of how large they get versus a
char.  I think they you guys have basically said that chars are never
faster.....which is the answer I was looking for.....everyone just
seemed to get hung up on the thought that there was a pre-existing
data format already in place or something.
ryan  putman - 04 Mar 2008 20:01 GMT
To provide an example....

The identity seeded integer style ID would have values like this -
1,2,3,4,5,6,7,8,9,10,11,12,13,14....34,35,36,37......45,46,47
The generated char ID would have values like this -
1,2,3,4,5,6,7,8,9,A,B,C,D,E..........Y,Z,10,11.........19,1A,1B

Obviously, you would eventually be storing a much smaller looking char
value compared to the INT value once the number of rows is into the
millions but the question is still is that char more efficient or
better in any way than the INT?
Andrew J. Kelly - 04 Mar 2008 21:36 GMT
A char or varch takes up 1 byte for each character. An INT takes up 4 bytes
regardless of the value. So as soon as the CHAR value get to 5 characters it
becomes larger than the INT and that would happen well before you hit the
millions let alone billions. And the INT is still more efficient than the
same size CHAR.  In SQL2008 the INT can even take up less space if the
values are not near the max.

Signature

Andrew J. Kelly    SQL MVP
Solid Quality Mentors

> To provide an example....
>
[quoted text clipped - 7 lines]
> millions but the question is still is that char more efficient or
> better in any way than the INT?
Tom Cooper - 04 Mar 2008 21:56 GMT
> Obviously, you would eventually be storing a much smaller looking char
> value compared to the INT value once the number of rows is into the

Smaller "looking" char value is right.  It only looks smaller.  Remember
that a varchar takes 2 extra bytes in your table.  So the characters for the
first 36 rows
0,1,2,3,4,5,6,7,8,9,A,B,C,D,E..........Y,Z each take 3 bytes to store which
is slightly better than an int which takes 4 bytes to store.
The next 1296 rows (00 thru ZZ) each take 4 bytes
The next 46,656 rows 000 thru ZZZ each take 5 bytes
The next 1,679,616 rows 0000 thru ZZZZ each take 6 bytes
The next 60,466,176 rows 00000 thru ZZZZZ each take 7 bytes
The next 2,176,782,336 rows 000000 thru ZZZZZZ each take 8 bytes
But each one of those rows could be assigned an int that only took 4 bytes.
So, as you can see, using varchar vs using an int will actually cost you
storage space once you have more than about 1300 rows.

And searching/comparisions/joining on an int value is faster than a varchar
value.

And using an identity attribute on the int to generate each value is much
easier and straight forward that the code you would need to find the next
varchar value when a row was inserted.

It's a slam dunk. Use an identity column and choose between smallint, int,
and bigint depending on how many rows could be added to the table.

Tom

> To provide an example....
>
[quoted text clipped - 7 lines]
> millions but the question is still is that char more efficient or
> better in any way than the INT?
ryan  putman - 05 Mar 2008 15:27 GMT
Thanks, everyone, for your replies.

They were definitely helpful.

Not sure I can convince the people in charge but at least now I know
I'm right :)
Tom Cooper - 05 Mar 2008 19:05 GMT
If they won't listen to you, suggest they run a test.  For example, a simple
test would be create a table with an integer primary key with the identity
clause and insert some number of rows into it and and see how long it takes
and how much space the table takes up when you're done.  Then do the same
thing with a table using a varchar key with the scheme you outlined and see
how long that takes and how much space that table takes up.  For example,
for the identity column and 100,000 rows, I ran

Create Table TestInt (PK int identity Primary Key)
go
Declare @Time1 datetime
Declare @Time2 datetime
Declare @InsertCounter int
Set @InsertCounter = 1
Set @Time1 = Current_Timestamp
While @InsertCounter <= 100000
Begin
 Insert TestInt Default Values
 Set @InsertCounter = @InsertCounter + 1
End
Set @Time2 = Current_Timestamp
Select DateDiff(ms, @Time1, @Time2) As ElaspedMS
Exec sp_spaceused 'TestInt', 'True'

(I know that's not the fastest way to insert 100,000 rows, but since I
presume your real system will insert the rows one at a time, I had this test
do that also.)  This test on my machine took about 50 seconds, and the
sp_spaceused said reserved space was 1,352 KB, data 1,288 KB, index 16 KB
and unused 48 KB.

With the varchar key, it's more complicated.  You need a way to store the
next key (I added a table with one row), a way to compute the next key (I
have a stored proc, please note this should not be used in production, it's
not bullet proof at all, it doesn't for example, handle concurrent requests
for the next key).  But this is only a test, so I was keeping it simple.  So
that code was

Create Table TestVC (PK varchar(6) Primary Key)
Create Table NextVC (PK int Primary Key, NextVCPK varchar(6))
Insert NextVC (PK, NextVCPK) Values (1, '1')
go
Create Procedure GetNextKey @NextKey varchar(6) Output As
Begin
 Declare @NewNextKey varchar(6)
 Declare @CharToProcess tinyint
 Declare @CharAsciiValue tinyint
 Declare @Done tinyint
 Select @NextKey = NextVCPK From NextVC Where PK = 1
 Set @NewNextKey = @NextKey
 Set @CharToProcess = Len(@NewNextKey)
/* if value is all Z's, Z or ZZ or ZZZ, etc
 then next value is a 1 followed by the appropriate number of zeros */
 If @NewNextKey In ('Z', 'ZZ', 'ZZZ', 'ZZZZ', 'ZZZZZ')
 Begin
   Set @NewNextKey = '1' + Replicate('0', @CharToProcess)
 End
 Else
 Begin
   Set @Done = 0
   While @Done = 0
   Begin
     Set @CharAsciiValue = Ascii(Substring(@NewNextKey, @CharToProcess, 1))
     /* Find the next value for this character */
     Set @CharAsciiValue = Case
   /* change 0-8 to the next number, 9 to A, A-Y to Z and Z to 0 */
       When @CharAsciiValue Between 48 And 56 Then @CharAsciiValue + 1
       When @CharAsciiValue = 57 Then 65
       When @CharAsciiValue Between 65 And 89 Then @CharAsciiValue + 1
       Else 48 End
     Set @NewNextKey = Stuff(@NewNextKey, @CharToProcess, 1,
Char(@CharAsciiValue))
     If @CharAsciiValue <> 48 Set @Done = 1 /* we are done */
     Else Set @CharToProcess = @CharToProcess - 1
   End
 End
 Update NextVC Set NextVCPK = @NewNextKey Where PK = 1
End
go
Declare @Time1 datetime
Declare @Time2 datetime
Declare @InsertCounter int
Declare @NextKey varchar(6)
Set @InsertCounter = 1
Set @Time1 = Current_Timestamp
While @InsertCounter <= 100000
Begin
 Exec GetNextKey @NextKey Output
 Insert TestVC (PK) Values (@NextKey)
 Set @InsertCounter = @InsertCounter + 1
End
Set @Time2 = Current_Timestamp
Select DateDiff(ms, @Time1, @Time2) As ElaspedMS
Exec sp_spaceused 'TestVC', 'True'

This took about 120 seconds, and the sp_spaceused said reserved space was
2,568 KB, data 2,504 KB, index 16 KB and unused 48 KB.

So about twice as long to load the varchar keys, twice as much space, and
much more complicated coding.

Tom

> Thanks, everyone, for your replies.
>
> They were definitely helpful.
>
> Not sure I can convince the people in charge but at least now I know
> I'm right :)
Andrew J. Kelly - 05 Mar 2008 19:14 GMT
And don't forget that every index you add will be that much larger as well.
That means more network and disk I/O and less that will fit into cache.

Signature

Andrew J. Kelly    SQL MVP
Solid Quality Mentors

> If they won't listen to you, suggest they run a test.  For example, a
> simple test would be create a table with an integer primary key with the
[quoted text clipped - 105 lines]
>> Not sure I can convince the people in charge but at least now I know
>> I'm right :)
TheSQLGuru - 04 Mar 2008 14:44 GMT
Andrew hit the nail on the head.  int WILL be more efficient for your
application as stated.

Also, don't waste the extra 4 bytes for a bigint if millions are all you
expect.  int can have -2.1B to +2.1B entries.  so unless you expect to hit
4BILLION + entries go with int identity seeded at the negative limit.

Signature

Kevin G. Boles
Indicium Resources, Inc.
SQL Server MVP
kgboles a earthlink dt net

> On Mar 3, 8:31 pm, "Andrew J. Kelly" <sqlmvpnooos...@shadhawk.com>
> wrote:
[quoted text clipped - 20 lines]
> is there a point where have a generated varchar key like A283CD is
> better than an integer ID of 2,828,828?
Andrew J. Kelly - 04 Mar 2008 14:49 GMT
I think Dan answered the questions but I have one more comment. I still
don't get why you feel the need to use a Varchar over an INT just because of
the number of rows.  An INT will hold plus and minus over 6 billion values
with only taking up 4 bytes and will always be more efficient in storage and
searching than the equivalent character datatype would be.

Signature

Andrew J. Kelly    SQL MVP
Solid Quality Mentors

> On Mar 3, 8:31 pm, "Andrew J. Kelly" <sqlmvpnooos...@shadhawk.com>
> wrote:
[quoted text clipped - 20 lines]
> is there a point where have a generated varchar key like A283CD is
> better than an integer ID of 2,828,828?
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2009 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.