Archives of the TeradataForum

Message Posted: Wed, 10 Jan 2007 @ 18:40:29 GMT


	<Prev	Next>		<<First	<Prev	Next>	Last>>

Subj:		Re: DataStage Enterprise Edition

From:		Robert D Meunier

In response to your questions:

True, False, True, 13, Never...

But seriously:

1. How is the Data Getting Loaded into the Teradata Tables (Fast load / Multiload / or others )?

Answer: DataStage has stages that allow you to use MultiLoad, FastLoad and TPump. In addition you can use the Teradata API stage or ODBC stage to do loading/manipulating of data. We used the Teradata API stage for most of the loading we did. Only if the volume was considerable did we use MultiLoad. The client did not purchase TPump, otherwise we would have used TPumps instead of MultiLoads as the volume of data wasn't that great and the middle tier server wasn't capable of feeding MultiLoad or FastLoad enough data fast enough. Part of this is because we pulled the data directly from the source tables as we loaded it. We never landed the data to disk. The source systems, for the most part, were incapable of feeding the data fast enough for the more robust Teradata Load Utilities.

2. how Dose it impact the No of Loads that Happens ?

Answer: Anytime you use Fastload/Multiload or FastExport it effects the number of load jobs running. Regardless of where these utilities are run from. So DataStage is no different then if you are running these load jobs from any platform. This is one of the reasons we stuck with the API stage as often as we could and one of the reasons why we would have preferred to use TPump.

3. How dose the Teradata System Recognises the DataStage Sessions ? (this because , in my Environment the Data Stage Sessions are getting Recognised as Fastloads , and since the no of the Fastloads Exceed the Limit set by DBScontrol the DataStage , as well as production Jobs MLOADS Get effected )

Answer: See above. If you tell DataStage to use a FastLoad or MultiLoad it's going to count as a load utility.

4. What are the Lessons that needed to be taken into account when developing a DataStage Job For a Teradata Environment ?

Answer: This is a pretty vague question. The best I can is that no matter how much experience you have with DataStage (or any other ETL tool for that matter) you really need to have people with Teradata experience to help the development along. Also see the below answer to question #5.

5. What are the Conditions that need to be taken Care of for the Teradata to run the Parlleljobs of Datastage ?

Answer: I'm assuming you want to know what the concerns are with running stages in parallel through DataStage. Several things come to mind:

1) Benchmark, benchmark, benchmark... Make sure you know where the bottleneck is in your architecture (it won't be Teradata). You can then design your stages accordingly.

2) Make sure you set the load utilities to run serially and not in parallel, all the stages prior to the load can run in parallel, but don't try and run the load utilities in parallel.

3) You can run the Teradata API stage in parallel and get some really good performance. I was able to get it to work successfully by using the "hash" option for splitting the data and designating all the PI columns. However, I didn't use it that often so I don't know if it will work, without blocking itself, in all cases. It helps if you have experience using TPump if you are going to do this as all the same problems can arise and they pretty much have all the same cures.

I hope this helps,

Robert


	<Prev	Next>		<<First	<Prev	Next>	Last>>

Attachments

Library

Quick Reference