Home Page for the TeradataForum
 

Archives of the TeradataForum

Message Posted: Mon, 24 Feb 2003 @ 21:00:50 GMT


     
  <Prev Next>   <<First <Prev Next> Last>>  


Subj:   Re: Couting duplicate rows thru multi load
 
From:   Kohut, Eric J

If you have a lot of duplicates (>50) and if you load duplicates into a set table it will slow down the load a lot without a Unique Index to use for a duplicate row check. If you have a Unique Index then it will slow down a little.

If you use a Multi-set table the load will not slow down as long as you throw away the duplicate rows (Automatic in Fastload) (Multiload has an option to ignore duplicate rows, otherwise this may be slow due to the row by row insert into the UV table)

Note; Since, Fastload automatically throws duplicates away, even a multi-set table will not store these if you use the Fastload Utility.

This process works well and in parallel if you design it correctly. However, the check needs to be done, somewhere. Duplicate rows in the final table are almost (haven't found a reason yet) never the correct thing to do.

Multiset tables are ok from a performance standpoint, but you need to know the data is clean.


Thanks,

Eric

EJK
Eric J. Kohut
Senior Solutions Consultant - Teradata Solutions Group - Retail
Certified Teradata Master
NCR Corp.



     
  <Prev Next>   <<First <Prev Next> Last>>  
 
 
 
 
 
 
 
 
  
  Top Home Privacy Feedback  
 
 
Copyright for the TeradataForum (TDATA-L), Manta BlueSky    
Copyright 2016 - All Rights Reserved    
Last Modified: 15 Jun 2023