Archives of the TeradataForum
Message Posted: Fri, 07 Sep 2001 @ 15:25:54 GMT
I think that Jim hit the nail on the head: If it affects the applications that use your datawarehouse, then it's your problem (and the downstream applications).
Since the existing applications have been working with the dirty data and the business is doing OK (I assume), what is it about having dirty data that is the problem? I know what you're talking about, but I think that you should look at the problem from another direction.
From my own experience, there's always dirty data of some type. The best answer has always been assuring the quality of the data as close to the source as possible. By the time it gets to the datawarehouse, it's probably too late to do anything about it. Depending on the nature of the data, attempting to scrub it at the datawarehouse front-end is likely to result in the further loss of data.
Consider an audit situation where you have to balance the contents of your datawarehouse against existing systems. If you aren't careful with your data scrubbing, then you won't be able to balance. I was involved with a project where we had go back to the dirty data just so that we could establish an audit trail and get sign-off on the project.
The argument went something like this: If they had been able to do business successfully with the data that they were already creating, what was it about the datawarehouse that made it so sensitive and unable to work in their environment? We thumped the D&C Bible, we did presentations about best practices and a bright future, we pleaded, we reasoned and eventually the message of the business community got through to us: We were trying to fix a problem that didn't affect their business. It was a noble effort and would have some business value, but there were so many legacy applications that it would have cost more to assure clean data than the value that they felt that they were going to get from the datawarehouse.
One of the problems that we tried to fix was the quality of addresses that we were receiving. We bought a package for householding and it worked great. Except that our addresses didn't match-up with existing applications and that prevented the rest of our data being reconciled with existing business records. What was eventually settled on was that we would use dirty data, that scrubbing was done at the point where they entered the address and some of the future reports (which were address based) were skipped (to some point in the future) - they just didn't have the business value to justify their cost.
Since the householding problem exists in one form or another in virtually every business situation and if you aren't their first customer, your vendor must have encountered this situation before. What did they do with in those situations?
|Copyright 2016 - All Rights Reserved|
|Last Modified: 28 Jun 2020|