Home Page for the TeradataForum
 

Archives of the TeradataForum

Message Posted: Thu, 05 Jun 2003 @ 13:41:06 GMT


     
  <Prev Next>  
<<First
<Prev
Next> Last>>  


Subj:   PPI's with INSERT
 
From:   David Clough

Hi guys,

perhaps I've misunderstood some the features of PPI's, but I was under the impression that adding new rows to a populated table will act like adding rows to an empty table, should the new rows not already exist in a partition, i.e. Transient logging would not be necessary.

I have this table definition :

CREATE MULTISET TABLE TESTCONS_T.CONKPI_NEW ,NO FALLBACK ,
     NO BEFORE JOURNAL,
     NO AFTER JOURNAL,
     DATABLOCKSIZE = 130560 BYTES
     (
      CON_SRKY_ID INTEGER  NOT NULL,
      CON_TYPE_CD CHAR(2) CHARACTER SET LATIN CASESPECIFIC  NOT NULL  COMPRESS 'WI',
      DTE_YRWK_NR_KPN INTEGER  NOT NULL ,
      CON_DELIVERY_DT      DATE,
      CON_DELIVERY_TM      TIME(0),
      CON_INCMPL_IN CHAR(1) CHARACTER SET LATIN CASESPECIFIC  NOT NULL  COMPRESS ('N','Y'),
      CON_NEG_IN CHAR(1) CHARACTER SET LATIN CASESPECIFIC  NOT NULL  COMPRESS ('N','Y'),
      CON_UNDEL_IN CHAR(1) CHARACTER SET LATIN CASESPECIFIC  NOT NULL  COMPRESS ('N','Y'),
      CON_PTPND_IN CHAR(1) CHARACTER SET LATIN CASESPECIFIC  NOT NULL  COMPRESS ('N','Y'),
   .
   .
   .
      CON_MANNRCVD_IN CHAR(1) CHARACTER SET LATIN CASESPECIFIC  NOT NULL  COMPRESS ('N','Y'),
      CON_NO_TR_IN CHAR(1) CHARACTER SET LATIN CASESPECIFIC  NOT NULL  COMPRESS ('N','Y'),
      CON_NONEG_IN CHAR(1) CHARACTER SET LATIN CASESPECIFIC  NOT NULL  COMPRESS ('Y','N'))
PRIMARY INDEX OPPCON_UPI ( CON_SRKY_ID )
PARTITION BY RANGE_N
(DTE_YRWK_NR_KPN BETWEEN 200301 AND 200552 EACH  1  )

which will hold our Key Performance Indicators for one of our main entities.

The intention of the partitioning scheme is to have one partition per week.

I added ~700K rows into a partition for DTE_YRWK_NR_KPN = 200315, which took 8 seconds.

I then added ~ 620K rows into another partition (DTE_YRWK_NR_KPN = 200316) which took 35 seconds.

I then added again the same 620K rows (i.e. DTE_YRWK_NR_KPN = 200316) which also took 35 seconds.

What I would not have expected was the time taken on the second of the three inserts (I was expecting something like 6 seconds).

It might be worth pointing out that I was adding the data from a similartable (exactly the same Primary Index) but not Partitioned. The Insert explain I got was this

Explanation
--------------------------------------------------
 
  1)First, we lock a distinct TESTCONS_T."pseudo table" for write on a RowHash to prevent global deadlock for TESTCONS_T.CONKPI_NEW.  
  2)Next, we lock a distinct TESTCONS_T."pseudo table" for read on a RowHash to prevent global deadlock for TESTCONS_T.CONKPI.  
  3)We lock TESTCONS_T.CONKPI_NEW for write, and we lock TESTCONS_T.CONKPI for read.  
  4)We do an all-AMPs RETRIEVE step from TESTCONS_T.CONKPI by way of an all-rows scan with a condition of ("TESTCONS_T.CONKPI.DTE_YRWK_NR_KPN = 200316") into Spool 1 (all_amps), which is redistributed by hash code to all AMPs. Then we do a SORT to partition Spool 1 by rowkey. The size of Spool 1 is estimated with high confidence to be 620,622 rows. The estimated time for this step is 11.12 seconds.  
  5)We do an all-AMPs MERGE into TESTCONS_T.CONKPI_NEW from Spool 1 (Last Use).  
  6)We spoil the parser's dictionary cache for the table.  
  ->No rows are returned to the user as the result of statement 1.  


No mention of partitions. In fact, data has been REDISTRIBUTED, which I didn't expect or understand.

Also, I am running in ANSI mode.

Certainly if I read the table on week 15, thus

Select * from TESTCONS_T.CONKPI_NEW where DTE_YRWK_NR_KPN = 200315


I get the expected partition elimination :

Explanation
--------------------------------------------------
 
  1)First, we lock a distinct TESTCONS_T."pseudo table" for read on a RowHash to prevent global deadlock for TESTCONS_T.CONKPI_NEW.  
  2)Next, we lock TESTCONS_T.CONKPI_NEW for read.  
  3)We do an all-AMPs RETRIEVE step from a single partition of TESTCONS_T.CONKPI_NEW with a condition of ("TESTCONS_T.CONKPI_NEW.DTE_YRWK_NR_KPN = 200315") with a residual condition of ("TESTCONS_T.CONKPI_NEW.DTE_YRWK_NR_KPN = 200315") into Spool 1 (group_amps), which is built locally on the AMPs. The size of Spool 1 is estimated with no confidence to be 195,383 rows. The estimated time for this step is 9.84 seconds.  
  ->The contents of Spool 1 are sent back to the user as the result of statement 1. The total estimated time is 9.84 seconds.  


Also If I delete from only 1 partition, I also get an explain that performs partition elimination,

Explanation
--------------------------------------------------
 
  1)First, we lock a distinct TESTCONS_T."pseudo table" for write on a RowHash to prevent global deadlock for TESTCONS_T.CONKPI_NEW.  
  2)Next, we lock TESTCONS_T.CONKPI_NEW for write.  
  3)We do an all-AMPs DELETE from a single partition of TESTCONS_T.CONKPI_NEW with a condition of ("TESTCONS_T.CONKPI_NEW.DTE_YRWK_NR_KPN = 200316") with a residual condition of ("TESTCONS_T.CONKPI_NEW.DTE_YRWK_NR_KPN = 200316").  
  ->No rows are returned to the user as the result of statement 1.  


but even this took 24 seconds to remove 1.2 odd million rows (i.e. twice the 620k row inserts).

So ... have I partly misunderstood PPI's?!


Dave Clough
Database Designer
Database Design Group



     
  <Prev Next>  
<<First
<Prev
Next> Last>>  
 
 
 
 
 
 
 
 
  
  Top Home Privacy Feedback  
 
 
Copyright for the TeradataForum (TDATA-L), Manta BlueSky    
Copyright 2016 - All Rights Reserved    
Last Modified: 27 Dec 2016