Home Page for the TeradataForum

Archives of the TeradataForum

Message Posted: Wed, 06 Feb 2008 @ 19:59:45 GMT

  <Prev Next>   <<First <Prev Next> Last>>  

Subj:   Re: Data extract methods
From:   Leslie, Quincy

I have "recovered" or "caught up" about seven or eight enterprise data warehouses over the last fifteen years. That does not include enterprise scale systems integrations, also with issues.

Source system "extracts" are a very straight forward matter. A) There is the source (it is identified), B) extract the data, usually into a flat file. So my suggestion is: No matter what the extraction tool, extract to a pipe-delimited-fields flat file. Model these flat file with a modeling tool!! Perform syntactical and symantical comparative analysis of the data sources. I.e. numeric match numerics, char to char, field lengths match up and an address from one source is the same type of address as other sources.

The major issue is the Transformation and Loading tool. If you acquire a tool that is predicated on transforming a flat file, then that is the traditional ETL (extract, transform load the target DW DBMS), or if you have a tool that utilizes a database, then you are ELT (extract, load database, then transform within the database).

I would strongly recommend a core approach that included a decent data modeling tool, such as Erwin (modest price, stable, fairly easy to use, supports the major DBMS's), and then develop a third normal form data model.

Get the best SME's who understand the entities in the enterprise and the transactions from the source data. Sooner or later, dimensional modeling will be inserted in the architecture. But if you don't ensure that the 3NF (or very close to it, and probably Boyce-Codd Normal Form won't be achieved, most of the time), then you will later invoke significant levels of effort (i.e. massive), to capture the data, load the data, integrate the data, and esp. to support reporting and all the Business Intelligence, etc. that a good data warehouse must support.

Unless you have the best practitioner in the world, avoid OOD (object oriented design and development). This will wreak havoc and actually requires an OOD database, which no one currently markets with any penetration.

In the final analysis, the above is vendor independent (unless they hard-sell a different approach).

My initial 2CW...

Quincy Leslie
Sr. Systems Coordinator - EDW/BI

  <Prev Next>   <<First <Prev Next> Last>>  
  Top Home Privacy Feedback  
Copyright for the TeradataForum (TDATA-L), Manta BlueSky    
Copyright 2016 - All Rights Reserved    
Last Modified: 28 Jun 2020