Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
Home
Discussion Groups
DB Engine
SQL ServerMSDESQL Server CE
Services
Analysis (Data Mining)Analysis (OLAP)DTSIntegration ServicesNotification ServicesReporting Services
Programming
CLRConnectivitySQLXML
Other Technologies
ClusteringEnglish QueryFull-Text SearchReplicationService Broker
General
Data WarehousingPerformanceSecuritySetupSQL Server ToolsOther SQL Server Topics
DirectoryUser Groups
Related Topics
MS AccessOther DB ProductsMS Server Products.NET DevelopmentVB DevelopmentJava DevelopmentMore Topics ...

SQL Server Forum / General / Data Warehousing / August 2005

Tip: Looking for answers? Try searching our database.

Correcting old data

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
Wreck - 17 Aug 2005 06:22 GMT
Hi,

What's the best way to handle data reloads when you have implemented Type 2
slowly changing dimension?

For example, you might have an Account dimension, with a status of open or
closed. Someone accidentally closes the account, which we then extract and
update, creating a new record. They then realise the mistake and re-open the
account. The next extract would be create a third record, with the original
value.

On top of this, fact data may have been loaded against the second (invalid)
record.

If the source system doesn't keep a history of these changes, then we can't
redo the extract because it will contain a mix of valid and invalid changes.

Thanks,
Wreck.
Myles.Matheson@gmail.com - 17 Aug 2005 08:07 GMT
Hello Wreck,

Have you considered type 1 for dimensions that are affected by this
issue? This way the transactions are not effected. You can have a
mixture of type 1 and 2 in your stars.

Check out:
http://bi-on-sql-server.blogspot.com/2005/07/news-group-post-slowly-changing.html

I wrote a blog on type 2 and type 1.

Type 2 only works at the point of time for the extracts as you
mentioned.  One way of resolving this problem with dirty changes is to
take more frequent extracts to capture more of the changes.

Another option is to remove dirty changes from the dimension if there
are no corresponding fact transactions.

Hope this helps

Myles Matheson
Data Warehouse Architect
http://bi-on-sql-server.blogspot.com/
Peter Nolan - 17 Aug 2005 16:04 GMT
Hi Wreck,
you are entering 'religious territory' ground......

There are two religions here:
1. Go back and correct data 'known to be incorrect'.

2. Retain data known to be incorrect and enter new transactions which
update the DW such that it now represents the data 'known' to be
correct. (Which is really just 'thought to be correct now'.)

I come down in the camp of retain the incorrect data and enter new
transactions such the data is now set to the 'thought to be correct
value'. In your case, the account would be 're-opened' and it would
occur as 're-opened'.

Why?

Because going back and updating data is a very slippery slope. Where
does it end? And if decisions were made against data that was presented
at a certain point in time and then the underlying data is 'corrected'
and people come along to review those decisions and the data says
'these decisions are not well founded on the data' what does one do?

One must remember that data is never, ever 'true' or 'correct' it is
only ever 'thought to be correct/true' which means that data that we
have realised is incorrect and we now believe to be true has the same
likelyhood of being incorrect as the data we just changed.....

Whereas, if we retain the fact that we found data to be incorrect and
we enter transactions to place new transactions to correct that data we
can see the information on which decisions were taken at the time. We
can also see how often data is being corrected.....perhaps there are
underlying issues for a high rate of data corrections occurring such as
lack of proper training in the call centers taking calls and entering
data? It happens....

In some cases it is a legal requirement not to correct historical data.
For example, in most countries it is illegal to update the financial
statements of a company after the close of the annual accounts....yet
clearly there have been some very 'incorrect' entries in the financials
of some very large companies over the last few years.

Best Regards
Peter Nolan
www.peternolan.com
Wreck - 18 Aug 2005 07:17 GMT
Thanks for your feedback guys.

Myles, the system currently has Type I changes across the board, and
naturally doesn't track history very well. I want to implement Type II, but
the data in the source systems has its issues, hence my concerns.

There are steps being taken to improve the data quality in the source systems.

Peter, I'm inclined to agree with you that we don't correct anything,
especially once it's been reported on. However, the business is still fairly
new to the whole warehouse concept - prior to this, it was the usual story of
lots of Access and Excel apps extracting and manipulating data for reporting.
When these data errors are found, they don't really understand why it's so
difficult to make a correction.

Thanks,
Wreck.

> Hi Wreck,
> you are entering 'religious territory' ground......
[quoted text clipped - 41 lines]
> Peter Nolan
> www.peternolan.com
Peter Nolan - 18 Aug 2005 10:50 GMT
Hi Wreck,
I have been doing DWing for 15 years....and I have seen a great many
cases of significant destruction of value within a company on the basis
that 'the business users do not understand'.........and I have seen
even more because 'the IT people do not understand'.!!!... ;-)

Indeed, it seems that those who 'do not understand' seem to think the
onus is on those who 'do understand' to explain to them what it is that
is not understood to persuade them not to destroy value in their
company.  Those who 'do not understand' do not seem to think the onus
is on them to make an effort to understand to assist them create value
in their company....and so value is commonly destroyed.....it would be
funnier if it was not so common....(LOL)!!

You are lucky in that today there are tons of materials available on
reasonable practices for DW development.

However, what is published is never the 'leading edge' and is rarely
'best practice'.

'Leading edge' and 'Best Practice' are retained within companies for
competitive advantage...;-)

Best Regards
Peter Nolan
www.peternolan.com
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.