Archives of the TeradataForum
Message Posted: Fri, 29 Apr 2005 @ 13:52:19 GMT
Subj: | | Re: MultiLoad performance vs NUPI |
|
From: | | Prescott, Kyle R |
MLOAD performs fairly well unless you have skewed or lumpy data. We have found that FASTLOAD generally outperforms MLOAD for skewed or lumpy
data. Two things to look at immediately are:
1) The number of hash-collisions within the non-unique primary index (NUPI column values that are hashing to the same row hash). If you have
many collisions, it will slow the load down as Teradata has to assign a uniqueness value to each row within the row hash (scanning all the
previous hash values within that row hash and assigning a sequence in order to distinguish one row from another). The deeper the number of
collisions, the more pronounced the problem. An easy query to verify the number of collisions is:
select hashrow() as rowhash_value
,count(*) as collision_cnt
from tablename
group by 1
order by 2 desc;
If you have many collisions you may want to expand the number of columns involved in the NUPI to add cardinality (reducing the number of
collisions). Also may want to define the table as MULTISET (if not already) as MLOAD will not check for duplicate rows during the load - also a
time saver. For changing SET tables to MULTISET tables, you have ensure there are no duplicate rows in the input or add some auditing to the post
load process to verify there are no duplicate rows. For only 11 million rows, MULTISET may not be necessary.
2) Is the input data sorted by NUPI column values? If so and you have many rows within some/each row hash, you will also slow the load down as
things are funneled to the same amp repetitively in succession.
Kyle Prescott
UnumProvident, Data Strategies
|