Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
Home
Discussion Groups
DB Engine
SQL ServerMSDESQL Server CE
Services
Analysis (Data Mining)Analysis (OLAP)DTSIntegration ServicesNotification ServicesReporting Services
Programming
CLRConnectivitySQLXML
Other Technologies
ClusteringEnglish QueryFull-Text SearchReplicationService Broker
General
Data WarehousingPerformanceSecuritySetupSQL Server ToolsOther SQL Server Topics
DirectoryUser Groups
Related Topics
MS AccessOther DB ProductsMS Server Products.NET DevelopmentVB DevelopmentJava DevelopmentMore Topics ...

SQL Server Forum / General / Data Warehousing / September 2005

Tip: Looking for answers? Try searching our database.

Header Vs Transaction Table

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
marcmc - 21 Sep 2005 13:07 GMT
Hey All,
Please have a quick read and add your opinions.

I work in the insurance industry.
In our datawarehouse there's a policy header table and a policy transaction
header.
The description of each policy number is in the form 'ABC/XYZ/1234567'

My problem is the header table holds this value as a varchar(15) and the
transaction table varchar(19). This means in about a year and a half by my
business estimates we will break the 'ABC/XYZ/9999999' boundary on the header
table. This is seriously bad news as well as bad initial design by a
reputable 3rd party.

I have also found that development over the years has led to alot of
programs been develeped and this issue was overlooked with regard to the
variables used to massage these values into our warehouse.

I am writing the spec for this change and have taken the approach of check
dependant objects across all databases for the policy number, rebuild
tables/views where necessary, alter table alter column for tables and changes
to the program variables that are too small to bring in the 16 character
number after 'ABC/XYZ/9999999'.

Is my approach correct, are there any tips for a job so little but yet so big?
Peter Nolan - 21 Sep 2005 15:17 GMT
Marcmc,
this is a very, very standard problem......and no...it is not a small
problem...it points directly to one of the toughest problems in data
warehousing....data integration...

The standard problem more generally stated is:

How do I effectively integrate and maintain relationships between data
that provides real keys that may change or be re-used over time?

For example, telephone numbers get longer, and they are constantly
re-used which is not a problem for the operational system because each
number is unique at a point in time but it is a BIG problem for the EDW
because it must understand that these things change over time....

The very standard answer is...if you are building  dimensional DW, all
keys are best as generated meaningless integers.....this means that
somewhere there must be translation tables translating real keys coming
in to meaningless integers and you must have some standardised
mechanism to allocate these keys.....

Where multiple pieces of data can arrive with multiple different
contents but represent the same thing you must have another translation
table converting them to an intermediate meaningless intger key which
is then itself converted to yet another meaningless integer key...of
course, the standard example here is customer keys coming from
different source systems.....since you work in Insurance you will be
well familiar that different policy management systems often allocate
their own customer numbers...

I have published the code to maintain keys in dimension tables in C so
that most people can read it...it's on www.peternolan.com....I have not
published the mechanism of using a cross reference table for things
like customer numbers but it's pretty trivial.....

So, a good question.... ;-)

Peter
www.peternolan.com
marcmc - 21 Sep 2005 17:23 GMT
Thankyou Peter, may I just add the following:

The policy number is not a key, we have done all the donkey work regarding
keys & meaningless integers.
They are constructed on an AS400 system, we merely dts them in and bring
them through to their final resting place as a description along another
generated key from surrogates.
They are stored seprately and a contructor is used to piece them together
with as part of our run.
We get the breakdown and the full version in our staging tables.

Merging Multiple codes is not an option.
I have done an extensive spec
I am not going to update any data, just increase the size of the 100 or so
fields, rebuild views, mod programs etc. I think this is the safest and maybe
the only option?

My biggest issue at the moment is I have no contingency plan worked out
apart from backing up the tables. By the way this could be impossible due to
space restrictions. Our largest table is almost 18Gb.

Any more ideas?
Peter Nolan - 24 Sep 2005 22:53 GMT
Hi Marcmc,

"The description of each policy number is in the form 'ABC/XYZ/1234567'
"

Are you saying you already have an integer that links these
descriptions/policy numbers together?  I'm not sure I see your problem
then...if you can uniquely identify a policy over time you will be
ok...if this description gets truncated and the integer key is still
accurate and unique over time you can merely have one description
'survive' the other....

For contingency, partitioing you largest tables on time and backing
them up only when they have changes is usual...but 18GB?? Whats' the
problem in backing that up? Just buy more disk, back it up and compress
the backed up files.....sql server will dump 18GB data in no time at
all...

Peter
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.