Archives of the TeradataForum

Message Posted: Fri, 29 Apr 2005 @ 13:52:19 GMT


	<Prev	Next>		<<First	<Prev	Next>	Last>>

Subj:		Re: MultiLoad performance vs NUPI

From:		Prescott, Kyle R

MLOAD performs fairly well unless you have skewed or lumpy data. We have found that FASTLOAD generally outperforms MLOAD for skewed or lumpy data. Two things to look at immediately are:

1) The number of hash-collisions within the non-unique primary index (NUPI column values that are hashing to the same row hash). If you have many collisions, it will slow the load down as Teradata has to assign a uniqueness value to each row within the row hash (scanning all the previous hash values within that row hash and assigning a sequence in order to distinguish one row from another). The deeper the number of collisions, the more pronounced the problem. An easy query to verify the number of collisions is:

     select hashrow() as rowhash_value
           ,count(*) as collision_cnt
     from tablename
     group by  1
     order by 2 desc;

If you have many collisions you may want to expand the number of columns involved in the NUPI to add cardinality (reducing the number of collisions). Also may want to define the table as MULTISET (if not already) as MLOAD will not check for duplicate rows during the load - also a time saver. For changing SET tables to MULTISET tables, you have ensure there are no duplicate rows in the input or add some auditing to the post load process to verify there are no duplicate rows. For only 11 million rows, MULTISET may not be necessary.

2) Is the input data sorted by NUPI column values? If so and you have many rows within some/each row hash, you will also slow the load down as things are funneled to the same amp repetitively in succession.

Kyle Prescott
UnumProvident, Data Strategies


	<Prev	Next>		<<First	<Prev	Next>	Last>>

Attachments

Library

Quick Reference