Home Page for the TeradataForum

Archives of the TeradataForum

Message Posted: Mon, 10 Jan 2011 @ 14:26:10 GMT

  <Prev Next>   <<First <Prev

Subj:   Re: Why fastload doesn't allow duplicates
From:   Dieter Noeth

Anshuman.Singh wrote:

  Fastload is meant to load the data fastest .accordingly duplicate rows are discarded as target table is treated as like a SET table . If duplicate row chking will be done , it will result in slowing down of the fastload performace since , (duplicate rows can be on different AMPs). For more details refer fastload manual.  


First you write it should be fast, that's why it's discarding dups.

Then you write duplicate row checks will slow down performance?

And duplicate rows will never be on different AMPs, because they must have the same PI-values.

FastLoad discards duplicate rows, because it doesn't have/store any information about the input record sequence like MultiLoad's Match Tag (ApplySeq+DMLSeq+ImportSeq+SMTSeq+SourceSeq), thus it simply doesn't know, if a row was duplicate within the data or was sent twice because of a restarted FastLoad (in Application Phase).

If FastLoad would be able to load MultiSet like MLoad, there would be more overhead regarding perm space. Currently the intermediate size of the target table is (almost) the same as the final size and this is one of the big advantages of FastLoad over MLoad.

I think FastLoad is older than MultiSet tables and there's no reason to add that feature as long as there's Mload.


  <Prev Next>   <<First <Prev
  Top Home Privacy Feedback  
Copyright for the TeradataForum (TDATA-L), Manta BlueSky    
Copyright 2016 - All Rights Reserved    
Last Modified: 15 Jun 2023