Archives of the TeradataForum
Message Posted: Wed, 10 Jan 2007 @ 18:40:29 GMT
In response to your questions:
True, False, True, 13, Never...
Answer: DataStage has stages that allow you to use MultiLoad, FastLoad and TPump. In addition you can use the Teradata API stage or ODBC stage to do loading/manipulating of data. We used the Teradata API stage for most of the loading we did. Only if the volume was considerable did we use MultiLoad. The client did not purchase TPump, otherwise we would have used TPumps instead of MultiLoads as the volume of data wasn't that great and the middle tier server wasn't capable of feeding MultiLoad or FastLoad enough data fast enough. Part of this is because we pulled the data directly from the source tables as we loaded it. We never landed the data to disk. The source systems, for the most part, were incapable of feeding the data fast enough for the more robust Teradata Load Utilities.
Answer: Anytime you use Fastload/Multiload or FastExport it effects the number of load jobs running. Regardless of where these utilities are run from. So DataStage is no different then if you are running these load jobs from any platform. This is one of the reasons we stuck with the API stage as often as we could and one of the reasons why we would have preferred to use TPump.
Answer: See above. If you tell DataStage to use a FastLoad or MultiLoad it's going to count as a load utility.
Answer: This is a pretty vague question. The best I can is that no matter how much experience you have with DataStage (or any other ETL tool for that matter) you really need to have people with Teradata experience to help the development along. Also see the below answer to question #5.
Answer: I'm assuming you want to know what the concerns are with running stages in parallel through DataStage. Several things come to mind:
1) Benchmark, benchmark, benchmark... Make sure you know where the bottleneck is in your architecture (it won't be Teradata). You can then design your stages accordingly.
2) Make sure you set the load utilities to run serially and not in parallel, all the stages prior to the load can run in parallel, but don't try and run the load utilities in parallel.
3) You can run the Teradata API stage in parallel and get some really good performance. I was able to get it to work successfully by using the "hash" option for splitting the data and designating all the PI columns. However, I didn't use it that often so I don't know if it will work, without blocking itself, in all cases. It helps if you have experience using TPump if you are going to do this as all the same problems can arise and they pretty much have all the same cures.
I hope this helps,
|Copyright 2016 - All Rights Reserved|
|Last Modified: 27 Dec 2016|