Archives of the TeradataForum

Message Posted: Mon, 10 Jan 2011 @ 14:26:10 GMT


	<Prev	Next>		<<First	<Prev	Next>	Last>>

Subj:		Re: Why fastload doesn't allow duplicates

From:		Dieter Noeth

Anshuman.Singh wrote:

Fastload is meant to load the data fastest .accordingly duplicate rows are discarded as target table is treated as like a SET table . If duplicate row chking will be done , it will result in slowing down of the fastload performace since , (duplicate rows can be on different AMPs). For more details refer fastload manual.

???

First you write it should be fast, that's why it's discarding dups.

Then you write duplicate row checks will slow down performance?

And duplicate rows will never be on different AMPs, because they must have the same PI-values.

FastLoad discards duplicate rows, because it doesn't have/store any information about the input record sequence like MultiLoad's Match Tag (ApplySeq+DMLSeq+ImportSeq+SMTSeq+SourceSeq), thus it simply doesn't know, if a row was duplicate within the data or was sent twice because of a restarted FastLoad (in Application Phase).

If FastLoad would be able to load MultiSet like MLoad, there would be more overhead regarding perm space. Currently the intermediate size of the target table is (almost) the same as the final size and this is one of the big advantages of FastLoad over MLoad.

I think FastLoad is older than MultiSet tables and there's no reason to add that feature as long as there's Mload.

Dieter


	<Prev	Next>		<<First	<Prev	Next>	Last>>

Attachments

Library

Quick Reference