Home Page for the TeradataForum
 

Archives of the TeradataForum

Message Posted: Thu, 07 Feb 2008 @ 18:35:26 GMT


     
  <Prev Next>   <<First <Prev
Next>
Last>>
 


Subj:   Re: Reduce loading time
 
From:   Victor Sokovin

Hi again Omer,

  - By reading the same data more than once I mean we query lets say table A and loads 10 target tables. But we don't have a single job that reads the data and loads into targets tables, we have 10 different jobs. So basically we are querying the same data 10 times.  


I see; thanks for the clarification.


  - We are using FSLDM  


  - The link that you had sent, we do have cases like those in generating keys. And yes that takes a very long time. We have like 7 columns that form the UPI. But if we removes them then would it not affect the join performance?  


In terms of joins, the 7-column UPI only helps if you join the table to another table with exactly the same 7-column (N)UPI. I'd guess it is unlikely that you do that a lot so changing the index structure can possibly only improve some other, more useful, joins. However, your new index is probably going to be a NUPI so you lose the uniqueness constraint but I'd say with the heavy logic already in place in the staging area and the rest of the ETL you can probably easily get away with it. So, experiment with NUPI with less columns, make the table MULTISET and check that the ETL provides for the elimination of potential undesired duplicates. You can then measure the benefits on the ML side and find the practical balance between the index complexity and the ML speed.

Pay attention that the skew on the new PI is within acceptable range (don't go for the 1% target for small tables, though; Dieter is always too strict on this).


  - From staging to end tables we use only MLOADs. So basically its MLOADS all the way.  


I see improvements here. Like I said, one way is to redesign a lot of these jobs into continuous flows. A short-term solution would be to translate ML jobs into SQL DML jobs, INSERT ... SELECT kind of things. The advantage of the latter is that they have less overhead: no initialization, no temporary tables or intermediate files and no technical logs. Of course, you can (and should) leave the application-level logs in place, goes without saying. The other advantage is that you can run many small SQL jobs in parallel without performance degradation.

Good luck and let us know how you progress.


Victor



     
  <Prev Next>   <<First <Prev
Next>
Last>>
 
 
 
 
 
 
 
 
 
  
  Top Home Privacy Feedback  
 
 
Copyright for the TeradataForum (TDATA-L), Manta BlueSky    
Copyright 2016 - All Rights Reserved    
Last Modified: 23 Jun 2019