Archives of the TeradataForum
Message Posted: Thu, 07 Feb 2008 @ 18:35:26 GMT
| Subj: || || Re: Reduce loading time |
| From: || || Victor Sokovin |
Hi again Omer,
| ||- By reading the same data more than once I mean we query lets say table A and loads 10 target tables. But we don't have a single job that
reads the data and loads into targets tables, we have 10 different jobs. So basically we are querying the same data 10 times.|| |
I see; thanks for the clarification.
| ||- The link that you had sent, we do have cases like those in generating keys. And yes that takes a very long time. We have like 7 columns
that form the UPI. But if we removes them then would it not affect the join performance?|| |
In terms of joins, the 7-column UPI only helps if you join the table to another table with exactly the same 7-column (N)UPI. I'd guess it is
unlikely that you do that a lot so changing the index structure can possibly only improve some other, more useful, joins. However, your new index
is probably going to be a NUPI so you lose the uniqueness constraint but I'd say with the heavy logic already in place in the staging area and the
rest of the ETL you can probably easily get away with it. So, experiment with NUPI with less columns, make the table MULTISET and check that the
ETL provides for the elimination of potential undesired duplicates. You can then measure the benefits on the ML side and find the practical
balance between the index complexity and the ML speed.
Pay attention that the skew on the new PI is within acceptable range (don't go for the 1% target for small tables, though; Dieter is always too
strict on this).
| ||- From staging to end tables we use only MLOADs. So basically its MLOADS all the way.|| |
I see improvements here. Like I said, one way is to redesign a lot of these jobs into continuous flows. A short-term solution would be to
translate ML jobs into SQL DML jobs, INSERT ... SELECT kind of things. The advantage of the latter is that they have less overhead: no
initialization, no temporary tables or intermediate files and no technical logs. Of course, you can (and should) leave the application-level logs
in place, goes without saying. The other advantage is that you can run many small SQL jobs in parallel without performance degradation.
Good luck and let us know how you progress.