Archives of the TeradataForum

Message Posted: Fri, 21 Feb 2003 @ 22:17:02 GMT


	<Prev	Next>		<<First	<Prev	Next>	Last>>

Subj:		Re: Can it be any faster

From:		Paul Johnson

This will take ages the way I type, but here goes...

Firstly, let's NOT assume that this is a problem. My first objective when faced with an 'problem' like this is to establish that I agree that there is a problem. If it's a single node machine then it looks quick, but if it's a 100+ node monster with nothing else running then it looks slow! There's no such thing as a vanilla Teradata platform out there.

What we need to know is the number of Rows per Amp Per Second (RAS) that we're getting. This is the metric that takes machine size into account. From memory I got about 1,000 RAS on a 48 AMP/4 node 5250 running V2R4.1 when I last ran a sizeable FastLoad of 400,000,000 rows on an otherwise empty machine. I would certainly expect a RAS figure in the hundreds for a FastLoad with decent performance.

In response to uknair's post (sorry we don't know your name):

- the modelling technique adopted is irrelevant as FastLoad is shipping data into an empty table that is basically a DBMS- based version of the client based (Unix/MVS/Win) file, maybe with some fields dropped and some re-ordering of fields into the target columns. It neither knows nor cares what the data model looks like.

- the number of sessions is not directly related to performance. If you measure the CPU/IO usage of each session you're likely to find that not all did any meaningful work. You may be able to drop the sessions down to 12 or so and see no decline in performance. In the V1 days the rule was 1 session per AMP but the advice on the algorithm to use these days varies according to who you ask.

In response to Doug's post:

1.) Hash collisions - are you getting many duplicates of the same PI?

Good point - what index are you using? UPI or NUPI? If NUPI what is the average number of occurrences of each NUPI value? Also, and often more importantly, what is the maximum number of occurrences of a single NUPI value? There is no excuse for not getting perfect, or near perfect, distribution via FastLoad as it is normally possible to use the logical key as the index. This has the benefit of ensuring the input file is as expected i.e. contains no duplicate keys.

2.) Order of the input dataset - If the data coming into the job is in the order of the PI and there are lots of duplicates you're likely to be experiencing hot AMPs. Most of the rows are being processed and therefore work is being performed on a single AMP at a time.

Good point - it can often be advantageous to sort a FastLoad input file on a 'dummy' sort field, such as account balance, to ensure there are no repeating key sequences on the input. This is only relevant if the target table has a NUPI though.

3.) Dup row check - related to 1 above, if you have many duplicate values of the PI and the table is Multiset a dup row check will be performed. The higher degree of duplication will cause dup row checks to rise exponentially. You can disable this check by using a SET table but be aware that duplicates will be accepted

Not so good point - duplicate row checking is performed on SET tables, not MULTISET tables! I'll assume that was a type Doug :-) However, FastLoad doesn't support the creation of duplicate rows into MULTISET tables anyway. Maybe that's changed with seven-dot- oh?

4.) Other concurrent workload and your priority - the obvious, contention from other queries relative to your assigned priority.

Good point - performance, or a perceived lack of it, can only be weighed up against concurrent workload. What was the machine state at the time?

5.) Network issues - you might be constrained on the network/channel over which the data is transmitted.

Good point - I have seen FastLoad run at horrendous speed due to network problems. In fact, the same one that got 1,000 RAS ran at about 1% of this speed due to network issues a few weeks earlier. These issues were not bandwidth related and were to do with the way the 5250 had been configured. Sorry I can't remember the precise details.

The other information that would be useful might include:

- number of nodes and type
- CPU speed
- number of AMPs
- DBMS version
- FastLoad version
- client environment (Unix/MVS/Win)
- input file media type (tape/disk) and media spec
- file layout
- target table DDL
- FastLoad script
- what else was running at the time?
- what priority is your userid?
- is PSF implemented in order to control resource allocation?
- was there any mini-cylpack activity at the time?
- was there any blocking at the time?
- how long did each phase take (load v end load)?
- were there any tape mounting delays (an old MVS favourite)?

Can't think of any more...it's after 10pm Friday night here!

Hope that helps,

Paul Johnson.


	<Prev	Next>		<<First	<Prev	Next>	Last>>

Attachments

Library

Quick Reference