Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
Home
Discussion Groups
DB Engine
SQL ServerMSDESQL Server CE
Services
Analysis (Data Mining)Analysis (OLAP)DTSIntegration ServicesNotification ServicesReporting Services
Programming
CLRConnectivitySQLXML
Other Technologies
ClusteringEnglish QueryFull-Text SearchReplicationService Broker
General
Data WarehousingPerformanceSecuritySetupSQL Server ToolsOther SQL Server Topics
DirectoryUser Groups
Related Topics
MS AccessOther DB ProductsMS Server Products.NET DevelopmentVB DevelopmentJava DevelopmentMore Topics ...

SQL Server Forum / Programming / SQL / July 2008

Tip: Looking for answers? Try searching our database.

To NULL or not to NULL

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
Elkestra - 29 Jul 2008 13:38 GMT
Purely theoretical while I'm finalising the design currently, so DDL isn't
directly relevant unless requested.

I'm currently writing a recipe database for an online game. Each recipe is
assigned a level within its appropriate skill, at which it becomes
available. When data is entered, initially some recipes will be unassigned
level-wise.

My first though was to have the level column (bigint) nullable, to indicate
that the level wasn't yet known, but having read all the anathema for NULL
values, would I be better making it NOT NULL, and then have it default to
0 - whilst have a bit-column flagging its 'level unknown' status, since 0
can also mean 'always available'.

Related to this, how would I deal with the recipe's skill-type if currently
unassigned. Again, first thought would be NULL, but would having an
'[Unassigned]' skill-type in the skills table be a better method?

Yours,

Elkestra
(Ann-Marie)
Rick Sawtell - 29 Jul 2008 14:03 GMT
> Purely theoretical while I'm finalising the design currently, so DDL isn't
> directly relevant unless requested.
[quoted text clipped - 18 lines]
> Elkestra
> (Ann-Marie)

In my design world, I would use a lookup table with a FK constraint as
follows.

CREATE TABLE dbo.SkillLevel (
SkillLevelID int IDENTITY (1,1) NOT NULL PRIMARY KEY CLUSTERED,
SkillLevelDescription varchar(50) NOT NULL
)

INSERT dbo.SkillLevel (SkillLevelDescription) VALUES ('Unknown')
INSERT dbo.SkillLevel (SkillLevelDescription) VALUES ('1st Level')
INSERT dbo.SkillLevel (SkillLevelDescription) VALUES ('2nd Level')

CREATE TABLE dbo.Recipe (
RecipeID int IDENTITY (1,1) NOT NULL PRIMARY KEY CLUSTERED,
RecipeName  varchar(50) NOT NULL,
RecipeDescription varchar(50) NOT NULL,
SkillLevelID int NOT NULL,

CONSTRAINT FK_Recipe_SkillLevel FOREIGN KEY (SkillLeveLID) REFERENCES
dbo.SkillLevel(SkillLevelID)
)

CREATE TABLE dbo.Components (
ComponentID int IDENTITY(1,1) NOT NULL PRIMARY KEY CLUSTERED,
ComponentName varchar(50) NOT NULL,
)

CREATE TABLE dbo.RecipeComponents (
RecipeID INT NOT NULL,
ComponentID INT NOT NULL,

CONSTRAINT FK_RecipeComponents_Recipe FOREIGN KEY (RecipeID) REFERENCES
dbo.Recipe(RecipeID),
CONSTRAINT FK_RecipeComponents_Components FOREIGN KEY (ComponentID)
REFERENCES dbo.Components(ComponentID)
)

Rick Sawtell
Roy Harvey (SQL Server MVP) - 29 Jul 2008 14:15 GMT
For all the discussion about NULLs it really isn't that big a deal. It
actually can work quite well for missing information, which is what
you appear to have.  Assigning a value that represents the same thing
is another perfectly valid answer.  I dislike the idea described that
0 actually have two very different meanings depending on the flag, so
I would be looking for some other approach.

If the recipe skill type means a recipe is only associated with one
skill then it is really the same question.  If, on the other hand, a
recipe can be associated with N skills I would suggest a "junction
table", a table with two columns, the PK of Recipes and the PK of
Skills, which together make up the PK of the RecipeSkill table.  In
that case instead of NULL or some special code you simple don't have
any row for a specific recipe.

I have to mention that the idea of using a bigint to represent what
sounds like a skill level, well that seems a bit extreme, but of
course I don't understand the requirements.

Roy Harvey
Beacon Falls, CT

>Purely theoretical while I'm finalising the design currently, so DDL isn't
>directly relevant unless requested.
[quoted text clipped - 18 lines]
>Elkestra
>(Ann-Marie)
Razvan Socol - 29 Jul 2008 14:38 GMT
Hello, Elkestra

If the level is unknown, I would use NULL in the Level column, so it will
not appear in any level. If you want to say 'always available', I would
use 0 in the Level column. Therefore, both values are acceptable, but with
different meanings. When the data entry is complete, you may want to check
for null values and assign them to a certain level or to 0 (always
available). And maybe change the definition of the column to disallow
NULLs thereafter. Anyway, I would not use a "LevelUnknown" bit column.

Signature

Razvan Socol
SQL Server MVP

--CELKO-- - 29 Jul 2008 21:51 GMT
>>  Each recipe is assigned a level within its appropriate skill, at which it becomes available. When data is entered, initially some recipes will be unassigned level-wise. <<

>> My first though was to have the level column (bigint) nullable, to indicate that the level wasn't yet known, but having read all the anathema for NULL values, would I be better making it NOT NULL, and then have it default to 0 - whilst have a bit-column flagging its 'level unknown' status, since 0 can also mean 'always available'.  <<

I assume that the skill_level has a range, like 0 to 10 or 100. Why
are you using BIGINT??? That is more than the number of atoms in the
Universe!  You are inviting garbage data.  And, yes, zero is the right
way to go, not a NULL.

>> Related to this, how would I deal with the recipe's skill_type if currently unassigned. Again, first thought would be NULL, but would having an '[Unassigned]' skill_type in the skills table be a better method?  <<

Yes it is, but why do you have a recipe without a skill_type -- BBQ,
baking, sauces, deserts, vegetables, fish, etc. ought to be known at
the start.  Maybe I don't understand the problem.

Get a copy of my book SQL PROGRAMMING STYLE and read chapter 5 on the
design of encoding schemes.
Hugo Kornelis - 30 Jul 2008 23:10 GMT
>My first though was to have the level column (bigint) nullable, to indicate
>that the level wasn't yet known, but having read all the anathema for NULL
[quoted text clipped - 5 lines]
>unassigned. Again, first thought would be NULL, but would having an
>'[Unassigned]' skill-type in the skills table be a better method?

Hi Elkestra / Ann-Marie,

I have to disagree with Rick Sawtell and Joe Celko, and agree with
Razvan Socol. Forget what you have read about NULLs being bad, or hard,
or whatever. They are not.

What *IS* hard, is dealing with missing information. You have to find
some way to represent that in the table; then yoou have to make sure
that all queries properly understand and deal with that representation
(for instance, a query for the average level should not take into
account the recipes with 0 if you use 0 to represent unknown difficulty,
otherwise your result is skewed).

In my experience, NULLs may not be the best solution for the missing
information problem, but all other solutions are far worse, making NULLs
the answer of my choice to this problem. Unless, of course, I can tackle
it at the root and ensure that no information will be missing - but that
is in many cases way beyond the control of the lowly DBA or database
developer.

Most of the problems people experience with NULLs fall in either of
these two categories:

1) They are actually problems with missing information. Using some magic
value to represent the missing information instead of NULL will not
solve the problem.

2) They are caused by improper design (and esp. improper use of NULLs in
the database design), or by insufficient understanding of what NULLs are
and how they affect query results.

If you use NULLs to represent the fact that some specific bit of
information (like, for instance, the level of a recipe) is missing in
your database, you should just be fine - as long as you make sure that
you know how NULLs affect query results.

<shameless self-plug>

Recommended reading:
http://sqlblog.com/blogs/hugo_kornelis/archive/2007/07/06/null-ndash-the-databas
e-rsquo-s-black-hole.aspx

http://sqlblog.com/blogs/hugo_kornelis/archive/2007/07/17/the-logic-of-three-val
ued-logic.aspx

http://sqlblog.com/blogs/hugo_kornelis/archive/2007/09/22/dr-unknown-or-how-i-le
arned-to-stop-worrying-and-love-the-null.aspx

http://sqlblog.com/blogs/hugo_kornelis/archive/2007/09/30/what-if-null-if-null-i
s-null-null-null-is-null.aspx


</shameless self-plug>

Signature

Hugo Kornelis, SQL Server MVP
My SQL Server blog: http://sqlblog.com/blogs/hugo_kornelis

--CELKO-- - 30 Jul 2008 23:19 GMT
>> 1) They are actually problems with missing information. Using some magic
value to represent the missing information instead of NULL will not
solve the problem. <<

I use these heuristics:

1) Avoid them if you can. Use a special value and put it into a
DEFAULT
clause, if the domain of your column has it.

2) If you have a NULL, make sure it has one and only one meaning in
that
domain. Example: The ICD codes for disease use 000.000 for
"Undiagnosed, you just got to the hospita" and 999.999 for "Diagnosed,
and we don't know what it is"
for two very different kinds of missing data.

3) Make sure the propagation property of NULLs makes sense in your
data
model. Otherwise, use COALESCE() to replace it with another value.

4) Remember that you must use "CAST (NULL AS <datatype>)" in places
where the compiler needs to know the datatype of a column or variable.
Hugo Kornelis - 31 Jul 2008 15:03 GMT
>>> 1) They are actually problems with missing information. Using some magic
>value to represent the missing information instead of NULL will not
[quoted text clipped - 5 lines]
>DEFAULT
>clause, if the domain of your column has it.

Hi Joe,

Never use magic values. Didn't they teach you anything in database 101?
Or were you that guy that was sleeping through the lessons in the back
of the class?

Next week, business requirements may change and what you thought would
never be a "real" value suddenly does become real. Ouch!

Or you forget to exclude the magical value from an aggregate query or
from a range search and get your results all mixed up. More ouch!

NULL will, by definition, never become a real value in the business,
since it's defined as the absence of any data value. (Check the SQL
standards; I'm sure you have a copy somewhere).

>2) If you have a NULL, make sure it has one and only one meaning in
>that
>domain.

NULLL is defined as "(a marker indicating) the absence of any data
value". What other definition can there be? What other meaning could it
possibly have?

> Example: The ICD codes for disease use 000.000 for
>"Undiagnosed, you just got to the hospita" and 999.999 for "Diagnosed,
>and we don't know what it is"
>for two very different kinds of missing data.

Bad example. First, this is ICD-9. ICD-10 was already completed in 1992
and most of the world has quickly adapted it. And ICD-11 is expected to
be complete in a few years.

But what's worse, is that this is simply not true. The codes you mention
are not defined; instead, there is a code 799.9: "Other unknown and
unspecified cause". It took me no more that five minutes of Googling to
find this site:

ftp://ftp.cdc.gov/pub/Health_Statistics/NCHS/Publications/ICD9-CM/2007/

Download Dtab08.zip and review the full list of codes. Enjoy!

But besides this beinig a bad example, there is something way more
fundamentally wrong with your argument. A disease is either diagnosed or
not. If it's not diagnosed, then there is no diagnosis. Period.

There may be various reasons for the absence of a diagnosis. Like the
two you mention, or "we have a diagnosis but lacked the time to enter
the information into the computer", or "some new disease that is not yet
defined in ICD-9, so please see the paper trail". Or others. And that
reason might be relevant for the application, or it might not be. If
it's not, then NULL for the diagnosis is all you need. And if the reason
for a missing diagnosis *is* important, than you should define that as a
new column. Because in databases, we never combine multiple attributes
in a single column - that would be a violation of first normal form.
Again, database 101.

>3) Make sure the propagation property of NULLs makes sense in your
>data
>model. Otherwise, use COALESCE() to replace it with another value.

That's just a different way to say what I already said: make sure you
know how NULLs affect query results.

>4) Remember that you must use "CAST (NULL AS <datatype>)" in places
>where the compiler needs to know the datatype of a column or variable.

That is only required if you use a hard-coded NULL in a situation where
the datatype can not be inferred from something else in the query; a
situation I have never seen outside of weird, carefully construed
classroom examples.

Real queries deal with data stored in columns in tables or (sometimes)
in variables. Both columns and tables have a datatype, so the DBMS knows
the datatype whether the value is NULL or not.

Signature

Hugo Kornelis, SQL Server MVP
My SQL Server blog: http://sqlblog.com/blogs/hugo_kornelis

--CELKO-- - 31 Jul 2008 17:29 GMT
>> NULL is defined as "(a marker indicating) the absence of any data value". What other definition can there be? What other meaning could it possibly have?  <<

Using your logic, zero is a number, so it must have the same meaning
in every domain where it is used?  No, of course not.  Zero can mean
"my gas tank is empty" or "this water is frozen" depending on the
domain.  In the case of NULL it might mean "we do not have an end date
for a duration yet", "This customer does not have a FAX machine",
etc.

If you look around, you will find that a lot of encoding schemes have
a "N/A", "miscellaneous", or other special values for different
situations.  The most common example is the NaN values in IEEE
floating point numbers for us.
Hugo Kornelis - 31 Jul 2008 20:42 GMT
>>> NULL is defined as "(a marker indicating) the absence of any data value". What other definition can there be? What other meaning could it possibly have?  <<
>
>Using your logic, zero is a number, so it must have the same meaning
>in every domain where it is used?

Yes. Zero always has the meaning of an amount that is exactly one less
than one (or however mathematicians would define it). Zero should never
have the meaning of sevenhundred and fourty three, or Orange.

>  No, of course not.  Zero can mean
>"my gas tank is empty"

Zero gallons.

Zero means one less than one, and gallons is a unit of measurement for
fluid amounts.

> or "this water is frozen"

Zero degrees centigrade.

Zero means one less than one, and degrees centigrade is a unit of
measurement for temperatures.

Of course, these two zeroes will be interpreted differently by the end
user, but there should not be ANY difference in how the database treats
either of them. Take any value x, add either zero gallons or zero
degrees centigrade to it, and the result should still be equal to x
(even though the addition of extra gallons might make sense if x is also
a fluid amount and the addition of degrees centigrade might not - with
the current support for domains in SQL Server, there's no way anyone can
expect SQL Server to know which one is valid and which one isn't; to SQL
Server zero is just the number zero; nothing more, nothing less.

> depending on the
>domain.  In the case of NULL it might mean "we do not have an end date
>for a duration yet", "This customer does not have a FAX machine",
>etc.

No. It just means that there is no end date known to the database, or
that there is no fax number known to the database.

If the end user happens to know that there will always be a fax number
in the database unless the customer doesn't have one - good for him. But
that does not mean that the MEANING "no fax machine" is somehow encoded
in the NULL, just as the MEANING "degrees centigrade" is not encoded in
the value zero.

>If you look around, you will find that a lot of encoding schemes have
>a "N/A", "miscellaneous", or other special values for different
>situations.  The most common example is the NaN values in IEEE
>floating point numbers for us.

1) Many encoding schemes are designed for use in flat file or XML data
interchanges. They should not be confused with relational principles.
What makes sense in an information interface should not automatically be
copied into the database. That's what staging tables are for.

2) Ain't it funny that of all the examples you give here, all just state
the absence of a real value without giving any reason for WHY the value
is absent?

(Though I'll immediately grant that there are lots of encodings where
magic numbers ARE indeed used to specify various reasons why a regular
value is not on file - like I said, makes sense in XML or flat file, but
not in a properly normalized database).

Signature

Hugo Kornelis, SQL Server MVP
My SQL Server blog: http://sqlblog.com/blogs/hugo_kornelis

Aaron Bertrand [SQL Server MVP] - 31 Jul 2008 15:15 GMT
> 1) Avoid them if you can. Use a special value and put it into a
> DEFAULT
> clause, if the domain of your column has it.

This sounds a lot like the "magic" values you always harp against, such as
the reason 50 and 255 are such popular choices, and why a punch card has 80
columns.  I used to think like this too, but I have since learned that
nothing really good comes from magic numbers made to look like real data
just because NULL raises panic for some reason.

> 2) If you have a NULL, make sure it has one and only one meaning in
> that
> domain. Example: The ICD codes for disease use 000.000 for
> "Undiagnosed, you just got to the hospita" and 999.999 for "Diagnosed,
> and we don't know what it is"
> for two very different kinds of missing data.

Neither of those are really candidates for NULL, since there is SOME data
about it, and NULL is supposed to represent the absolute absence of data.
If this data you cite is important (obviously it is), then it should be part
of the model, period.  Whether that means the ICD_Code column has more
candidate values (probably all 1 million possible codes are in a check
constraint, right?), or the reason for the NULL value is explained in
another column, I am not sure...

A
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.