|
|
Archives of the TeradataForum
Message Posted: Fri, 19 Jan 2001 @ 13:56:25 GMT
Subj: | | Re: Unique Value Algorithm |
|
From: | | Claybourne Barrineau |
Glenn and Frank,
I tried the CSUM approach to this issue, and it seems to work fine with a small amount of data.
However, I was planning to build about 46 million records, and the CSUM approach apparently redistributes all the data (even though my
source and target file have the same primary index.)
explain
Insert into INFVOLTSTT.T_F_VOL_ATOMIC_BDL1
(
Uniq_Code,
Bas_Artl_Nbr,
Btlr_Org_ID,
Bsns_Lctn_ID,
Setl_Dt,
Phys_Case_Qty,
Grs_Rvnu_Amt,
Artl_Org_ID
)
Select csum(1, Setl_Dt, Bas_Artl_Nbr, Bsns_Lctn_ID),
Bas_Artl_Nbr,
Btlr_Org_ID,
Bsns_Lctn_ID,
Setl_Dt,
Phys_Case_Qty,
Grs_Rvnu_Amt,
Artl_Org_ID
From XEIWTMPT.TBDL700_Clay;
Explanation -------------------------------------------------- | |
| 1) | First, we lock a distinct INFVOLTSTT."pseudo table" for write on a RowHash to prevent global deadlock for
INFVOLTSTT.T_F_VOL_ATOMIC_BDL1.
| |
| 2) | Next, we lock a distinct XEIWTMPT."pseudo table" for read on a RowHash to prevent global deadlock for XEIWTMPT.TBDL700_Clay.
| |
| 3) | We lock INFVOLTSTT.T_F_VOL_ATOMIC_BDL1 for write, and we lock XEIWTMPT.TBDL700_Clay for read.
| |
| 4) | We do an all-AMPs STAT FUNCTION step from XEIWTMPT.TBDL700_Clay by way of an all-rows scan with no residual conditions into Spool 5
(Last Use), which is assumed to be redistributed by value to all AMPs. The result rows are put into Spool 3, which is built locally on the
AMPs.
| |
| 5) | We do an all-AMPs RETRIEVE step from Spool 3 (Last Use) by way of an all-rows scan into Spool 1, which is redistributed by hash code
to all AMPs. Then we do a SORT to order Spool 1 by row hash. The size of Spool 1 is estimated with no confidence to be 0 row. The
estimated time for this step is 0.05 seconds.
| |
| 6) | We do a MERGE into INFVOLTSTT.T_F_VOL_ATOMIC_BDL1 from Spool 1 (Last Use).
| |
| 7) | Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request.
| |
| -> | No rows are returned to the user as the result of statement 1.
| |
I ended up using the HASHROW function that Frank Martinez recommend (Thank You!) This approach seems to be the best approach for a
large amount of data. No data redistribution took place (if in fact that was the cause of the problem in the CSUM aproach).
Thank you all for your help and input.
Claybourne Barrineau
| |