Archives of the TeradataForum
Message Posted: Fri, 15 Feb 2002 @ 15:26:13 GMT
Multiset tables may be needed by the business if ETL, particularly the "E", does not provide a unique identifier i.e. Multiset tables are required if the extract from source has too few columns to make EVERY column unique e.g. (real life example) when a customer withdraws cash from an ATM and immediately makes a second withdrawal because they realised they needed twice the initial amount, then you will need the transaction number or precise time of the transaction (HHMM is not enough) or a row number, system generated by "TL", or a multiset table to cope with the "apparent" duplications.
Suggestions to investigate/consider in endeavours to reduce load time:
- Check ETL process avoids loading too many nulls, particularly in index columns
- Use NUPI rather than UPI at least on initial tables.
- Use same PI on initial tables as on final tables, otherwise Teradata is unnecessarily lifting and respreading the data across the nodes.
- Check that tables are appropriately normalised
- Consider volume of data per record, i.e. record length, it might be more practical to use/load two tables instead of one although that requires increased space for the key of the second half of the data.
- Reconsider need for daily update to large table - you might satisfy business requirements by separate tables: a small table with recent transactions and a larger table for history into which the small table is "emptied" using slack time e.g. at weekends.
- Alternatively consider dynamic continuous update e.g. using TPump but note that that will require every row to be journaled and therefore use more cpu.
- Another "off the wall", I seem to remember that Teradata prefers to load and store numerical data as numbers instead of characters, irrespective of whether calculations will be performed on them. This might be particularly relevant for ID Numbers in columns that are indexed.
|Copyright 2016 - All Rights Reserved|
|Last Modified: 28 Jun 2020|