Archives of the TeradataForum

Message Posted: Mon, 24 Feb 2003 @ 21:00:50 GMT


	<Prev	Next>		<<First	<Prev	Next>	Last>>

Subj:		Re: Couting duplicate rows thru multi load

From:		Kohut, Eric J

If you have a lot of duplicates (>50) and if you load duplicates into a set table it will slow down the load a lot without a Unique Index to use for a duplicate row check. If you have a Unique Index then it will slow down a little.

If you use a Multi-set table the load will not slow down as long as you throw away the duplicate rows (Automatic in Fastload) (Multiload has an option to ignore duplicate rows, otherwise this may be slow due to the row by row insert into the UV table)

Note; Since, Fastload automatically throws duplicates away, even a multi-set table will not store these if you use the Fastload Utility.

This process works well and in parallel if you design it correctly. However, the check needs to be done, somewhere. Duplicate rows in the final table are almost (haven't found a reason yet) never the correct thing to do.

Multiset tables are ok from a performance standpoint, but you need to know the data is clean.

Thanks,

Eric

EJK
Eric J. Kohut
Senior Solutions Consultant - Teradata Solutions Group - Retail
Certified Teradata Master
NCR Corp.


	<Prev	Next>		<<First	<Prev	Next>	Last>>

Attachments

Library

Quick Reference