Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
Home
Discussion Groups
DB Engine
SQL ServerMSDESQL Server CE
Services
Analysis (Data Mining)Analysis (OLAP)DTSIntegration ServicesNotification ServicesReporting Services
Programming
CLRConnectivitySQLXML
Other Technologies
ClusteringEnglish QueryFull-Text SearchReplicationService Broker
General
Data WarehousingPerformanceSecuritySetupSQL Server ToolsOther SQL Server Topics
DirectoryUser Groups
Related Topics
MS AccessOther DB ProductsMS Server Products.NET DevelopmentVB DevelopmentJava DevelopmentMore Topics ...

SQL Server Forum / General / Other SQL Server Topics / April 2008

Tip: Looking for answers? Try searching our database.

SELECT * vs SELECT col1, col2

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
Danny Burton - 17 Sep 2007 21:25 GMT
I know that in order to reduce network traffic you should always select only
the columns you require but I have a stored procedure which must **always**
return all columns from a specific table.

When returning all columns from a table is there any performance gain (or
any other reason) for using SELECT col1, col2 etc... instead of just SELECT
* ?

I can immediately think of a disadvantage which is that if you include named
columns in your SELECT then you must update the stored procedure each time a
column is added to the table, using SELECT * gets around this so the stored
procedure will never need to be modified.

Would anyone care to share their thoughts on the above?

Thanks,

Clive
Leythos - 17 Sep 2007 22:06 GMT
> I know that in order to reduce network traffic you should always select only
> the columns you require but I have a stored procedure which must **always**
[quoted text clipped - 10 lines]
>
> Would anyone care to share their thoughts on the above?

Imagine if you returned columns that the application requesting the info
didn't know how to handle. Select * means you didn't know what you were
expecting on one side or the other or both....

Signature

Leythos
- Igitur qui desiderat pacem, praeparet bellum.
- Calling an illegal alien an "undocumented worker" is like calling a
 drug dealer an "unlicensed pharmacist"
spam999free@rrohio.com (remove 999 for proper email address)

David Portas - 17 Sep 2007 22:11 GMT
>I know that in order to reduce network traffic you should always select
>only the columns you require but I have a stored procedure which must
[quoted text clipped - 14 lines]
>
> Clive

When doing maintenance or making changes it's often useful to do a search
over your code to find exactly where a column is being referenced. If you
use * rather than column names then you will be less likely to find
potential dependencies in your code. That can make debugging a much harder
task.

In some environments it can be difficult to maintain total control over the
logical column order returned by *. For example it's easy to re-create a
table in development (maybe even by accident) but it's probably totally
unacceptable to do that in production. So if you use SELECT * you may get
columns returned in some unexpected order, which could be a problem
depending on how your client process consumes that data. By the way, I don't
recommend that data be processed based on fixed column order anyway, but
based on your remark about not wanting to make code changes when you add new
columns I assume that must be what you are doing.

Some code may break or give the wrong results due to column order
dependencies or new or missing columns. For example UNION or CHECKSUM(*) or
INSERT without a column list. Regarding INSERT, how will you populate your
table if you don't know what columns exist in it?.

It is also a foundation of relational database design, that columns are
identified by name and not position. The compromise you are suggesting is
only possible at all because SQL violates that principle.

In summary, experience suggests it is much less trouble to maintain code
with column names than without them.

One other consideration. One day someone else will review or even inherit
your code. Using SELECT * in your code just guarantees an oppotunity for
someone to point the finger and tut-tut about best practices and coding
standards...

Signature

David Portas

Erland Sommarskog - 17 Sep 2007 22:12 GMT
> I know that in order to reduce network traffic you should always select
> only the columns you require but I have a stored procedure which must
> **always** return all columns from a specific table.

Unless this is a temp table created within the procedure, I would
question the wise in this. A question that I often have reason to
ask about our database is "is this column actually used for something".
It is not uncommon that I find a couple of procedures that return this
column, but I also see that they return of columns that are of no
interest in that context, so it looks like "let's select all columns
while we're at it, in case we need them later".

A stored procedure should in my opinion only return the column that
the caller actually requests.

> When returning all columns from a table is there any performance gain
> (or any other reason) for using SELECT col1, col2 etc... instead of just
> SELECT * ?

No.

> I can immediately think of a disadvantage which is that if you include
> named columns in your SELECT then you must update the stored procedure
> each time a column is added to the table, using SELECT * gets around
> this so the stored procedure will never need to be modified.

As I said, I question the wise in this. And what if the colunms in the
tables are rearranged, renamed or a column is dropped? With SELECT *,
the procedure will continue to run, but the client may be utterly
confused. With listed columns, you would get an error if columns are
renamed or dropped. And if they are merely rearranged, the client
will not even notice.


Signature

Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx

Hugo Kornelis - 19 Sep 2007 20:03 GMT
>When returning all columns from a table is there any performance gain (or
>any other reason) for using SELECT col1, col2 etc... instead of just SELECT
>* ?

Hi Clive,

I have to contradict Erland here - there will be a small performance
hit, that might (in very rare cases) grow to be a annoying enough to be
noticeable.

For SELECT *, the first step in parsing is to find out which columns are
in the table. That requires a read on the syscolumns system table in SQL
Server 2000, or it's undocumented equivalent in SQL Server 2005. That in
itself means that you'll have a small bit of overhead, due to requesting
a lock, getting it, reading the data, and releasing the lock again.

If this happens frequently in a system that also frequently adds or
removes columns to of from tables, you might see blocking on the system
table, since the Sch-S (schema stability) lock taken for reading the
columns can't co-exist with the Sch-M (schema modification) lock taken
for changing the table structure.

Signature

Hugo Kornelis, SQL Server MVP
My SQL Server blog: http://sqlblog.com/blogs/hugo_kornelis

Erland Sommarskog - 19 Sep 2007 22:10 GMT
> For SELECT *, the first step in parsing is to find out which columns are
> in the table. That requires a read on the syscolumns system table in SQL
> Server 2000, or it's undocumented equivalent in SQL Server 2005. That in
> itself means that you'll have a small bit of overhead, due to requesting
> a lock, getting it, reading the data, and releasing the lock again.

But if you list all columns in the table, the optimizer still has to
read all rows in sys.columns for the table verify that all columns
exists, to find their data types etc. So the amount of locking would be
the same.

Signature

Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx

Serge Rielau - 19 Sep 2007 22:45 GMT
>> For SELECT *, the first step in parsing is to find out which columns are
>> in the table. That requires a read on the syscolumns system table in SQL
[quoted text clipped - 6 lines]
> exists, to find their data types etc. So the amount of locking would be
> the same.
Uhmm.. I don't think I'd want to confuse the official information schema
with what the DBMS uses for itself. You assume there is neither
compacting of the decomposed information into one object (like a LOB)
nor caching of the schema in memory.

Cheers
Serge
Signature

Serge Rielau
DB2 Solutions Development
IBM Toronto Lab

Erland Sommarskog - 20 Sep 2007 22:10 GMT
> Uhmm.. I don't think I'd want to confuse the official information schema
> with what the DBMS uses for itself. You assume there is neither
> compacting of the decomposed information into one object (like a LOB)
> nor caching of the schema in memory.

I would guess that SQL Server caches system tables more or less like
either tables. That would at least make most sense to me. DB2 may be
different.

And caching or not, the metadata changes requires a lock. I did this in
one window:

  begin transaction
  alter table Orders add  xyz int null

And in another window I tried to running SELECTs against the Orders table,
and it no difference if I had "SELECT *" or if I selected a small number
of columns. The query blocked.

Again, this may be different in DB2.

Signature

Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx

cphite@gmail.com - 19 Sep 2007 21:21 GMT
> I know that in order to reduce network traffic you should always select only
> the columns you require but I have a stored procedure which must **always**
> return all columns from a specific table.

Reducing traffic really isn't the main reason.

> When returning all columns from a table is there any performance gain (or
> any other reason) for using SELECT col1, col2 etc... instead of just SELECT
> * ?

Simply put, when you are writing production code you should KNOW for
certain exactly how and why everything works.  You should know exactly
which columns are going to be returned; even if it happens to be all
of them.

> I can immediately think of a disadvantage which is that if you include named
> columns in your SELECT then you must update the stored procedure each time a
> column is added to the table, using SELECT * gets around this so the stored
> procedure will never need to be modified.

Yes, you will have to modify the procedure in that case.  However,
while that does require that you spend time actually doing work, it is
still preferable to a situation where somebody changed a table
somewhere and your stored procedure keeps "working" despite returning
the wrong data.  If all they did is add a column you might be okay -
what if they changed the order of some columns?  What if they dropped
a column your application was expecting?

While there may not be a significant performance gain, there might
well be gains in terms of not getting phone calls in the middle of the
night because some application crashed, or because some job failed
when a table was changed earlier in the day.

> Would anyone care to share their thoughts on the above?

Using SELECT * in a procedure, view, or whatever is generally just
lazy and sloppy.
Knowledgy - 19 Jan 2008 12:08 GMT
Adding columns to tables frequently points to a poor database design.

Signature

Sincerely,
John K
Knowledgy Consulting
http://knowledgy.org

Atlanta's Business Intelligence and Data Warehouse Experts

>I know that in order to reduce network traffic you should always select
>only the columns you require but I have a stored procedure which must
[quoted text clipped - 14 lines]
>
> Clive
Serge Rielau - 19 Jan 2008 15:23 GMT
> Adding columns to tables frequently points to a poor database design.
... or a vibrant changing company with new requirements in their IT...
I see no harm in adding columns. Things that don't change are dead...

Signature

Serge Rielau
DB2 Solutions Development
IBM Toronto Lab

Brian Bunin - 27 Jan 2008 18:02 GMT
>Adding columns to tables frequently points to a poor database design.

It must be nice to work for a company that doesnt change the requirements
either mid development, or after it's deployed.
Knowledgy - 29 Feb 2008 01:51 GMT
It's called abstraction and proper database design

Signature

Sincerely,
John K
Knowledgy Consulting
http://knowledgy.org/

Atlanta's Business Intelligence and Data Warehouse Experts

>>Adding columns to tables frequently points to a poor database design.
>
> It must be nice to work for a company that doesnt change the requirements
> either mid development, or after it's deployed.
jhofmeyr@googlemail.com - 03 Mar 2008 13:11 GMT
> It's called abstraction and proper database design

BI != transactional DB

And even if it were, in my experience it is very (very very) rare that
a BI system doesn't evolve over time.  New business areas, new
dimension attributes, new hierarchies, performance optimisations, etc,
etc...
David Portas - 29 Apr 2008 19:30 GMT
> It's called abstraction and proper database design

What are you suggesting? That one should always create a new table when
adding a new attribute to the data model? Maybe use binary relations or 6th
Normal Form?

Even using 6NF you would still need to add to an existing table if you
needed to add an attribute to make a compound key. You could create a new
copy of the table instead I suppose but I fail to see why that would be
preferable to adding an attribute to an existing table.

Signature

David Portas

 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2009 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.