Re: Teradata Peformance issues when loading data to Teradata using ETL tools like Informatica or Data Stage [From Michael Larkins: Mon, 29 Mar 2004 @ 15:58 GMT]

Message Posted: Mon, 29 Mar 2004 @ 15:58:54 GMT

Subj:

Re: Teradata Peformance issues when loading data to Teradata using ETL tools like Informatica or Data Stage

From:

Michael Larkins

Hi Craig:

I am going to take a shot at an answer to your Fastload question, regarding "unloading." The reason for the quotes is that Fastload (INSERT command)only loads and does not retrieve rows. So, I will guess that you are thinking of FastExport, which does unload (SELECT command).

With that stated, Fastload does not use the Transient Journal (TJ) or SPOOL to do its inserts, hence the "fast" part of its name. The TJ contains the before-image of any changed row for all transactions to guarantee database integrity. The transaction must complete successfully, or no work should be done at all. However, one INSERT completes work. Therefore, the TJ is used rollback any unsuccessful transaction. If this is concept is already known to you, I apologize for reiterating it.

On the extract/export side of the equation, FastExport does not use the TJ, because there is no change with a read. Instead, it checkpoints blocks to be transferred. However, it does use SPOOL to temporarily hold the rows being read. Then, it must build the large data blocks that will be sent back to the client computer running FastExport. Although the building of the blocks takes time, it normally saves a lot of time doing the actual transfer to the other computer. Plus, the blocks are built in PERM space, so if something bad happens, FastExport can be restarted from the point of "failure" without doing the SELECT again.

What can slow down FastExport is an ORDER BY. Since rows have to come back in a predictable sequence, Teradata cannot build the blocks randomly. The rows must be sorted and used to build the blocks in sequence. In order to do this, the AMPs must do 2 different redistributions of rows to get the first row (and subsequent rows) into the first block.

The logical reaction is to say, "Don't use the ORDER BY for speed." However, if the rows must be in a predictable sequence, then a sort must be done somewhere. I would suggest doing it in parallel on Teradata instead of on the other computer in a more serial fashion. Plus as your number of extracted rows nears 1M or more, the speed of FastExport is noticably faster than a tool like BTEQ (row oriented vs block) that starts sending rows back as soon as the SELECT finishes. In addition, most of these tools do not have restart capabilities. The SELECT is required to run again if a single component (Teradata, other computer, network, et al) fails.

I hope this is the type of info that you are looking for.

Best regards,

Michael Larkins
Certified Teradata Master

Attachments

Library

Quick Reference

Archives of the TeradataForum

Message Posted: Mon, 29 Mar 2004 @ 15:58:54 GMT