Archives of the TeradataForum

Message Posted: Thu, 05 Jun 2003 @ 13:41:06 GMT


	<Prev	Next>		<<First	<Prev	Next>	Last>>

Subj:		PPI's with INSERT

From:		David Clough

Hi guys,

perhaps I've misunderstood some the features of PPI's, but I was under the impression that adding new rows to a populated table will act like adding rows to an empty table, should the new rows not already exist in a partition, i.e. Transient logging would not be necessary.

I have this table definition :

CREATE MULTISET TABLE TESTCONS_T.CONKPI_NEW ,NO FALLBACK ,
     NO BEFORE JOURNAL,
     NO AFTER JOURNAL,
     DATABLOCKSIZE = 130560 BYTES
     (
      CON_SRKY_ID INTEGER  NOT NULL,
      CON_TYPE_CD CHAR(2) CHARACTER SET LATIN CASESPECIFIC  NOT NULL  COMPRESS 'WI',
      DTE_YRWK_NR_KPN INTEGER  NOT NULL ,
      CON_DELIVERY_DT      DATE,
      CON_DELIVERY_TM      TIME(0),
      CON_INCMPL_IN CHAR(1) CHARACTER SET LATIN CASESPECIFIC  NOT NULL  COMPRESS ('N','Y'),
      CON_NEG_IN CHAR(1) CHARACTER SET LATIN CASESPECIFIC  NOT NULL  COMPRESS ('N','Y'),
      CON_UNDEL_IN CHAR(1) CHARACTER SET LATIN CASESPECIFIC  NOT NULL  COMPRESS ('N','Y'),
      CON_PTPND_IN CHAR(1) CHARACTER SET LATIN CASESPECIFIC  NOT NULL  COMPRESS ('N','Y'),
   .
   .
   .
      CON_MANNRCVD_IN CHAR(1) CHARACTER SET LATIN CASESPECIFIC  NOT NULL  COMPRESS ('N','Y'),
      CON_NO_TR_IN CHAR(1) CHARACTER SET LATIN CASESPECIFIC  NOT NULL  COMPRESS ('N','Y'),
      CON_NONEG_IN CHAR(1) CHARACTER SET LATIN CASESPECIFIC  NOT NULL  COMPRESS ('Y','N'))
PRIMARY INDEX OPPCON_UPI ( CON_SRKY_ID )
PARTITION BY RANGE_N
(DTE_YRWK_NR_KPN BETWEEN 200301 AND 200552 EACH  1  )

which will hold our Key Performance Indicators for one of our main entities.

The intention of the partitioning scheme is to have one partition per week.

I added ~700K rows into a partition for DTE_YRWK_NR_KPN = 200315, which took 8 seconds.

I then added ~ 620K rows into another partition (DTE_YRWK_NR_KPN = 200316) which took 35 seconds.

I then added again the same 620K rows (i.e. DTE_YRWK_NR_KPN = 200316) which also took 35 seconds.

What I would not have expected was the time taken on the second of the three inserts (I was expecting something like 6 seconds).

It might be worth pointing out that I was adding the data from a similartable (exactly the same Primary Index) but not Partitioned. The Insert explain I got was this

`Explanation --------------------------------------------------`
	`1)`	`First, we lock a distinct TESTCONS_T."pseudo table" for write on a RowHash to prevent global deadlock for TESTCONS_T.CONKPI_NEW.`
	`2)`	`Next, we lock a distinct TESTCONS_T."pseudo table" for read on a RowHash to prevent global deadlock for TESTCONS_T.CONKPI.`
	`3)`	`We lock TESTCONS_T.CONKPI_NEW for write, and we lock TESTCONS_T.CONKPI for read.`
	`4)`	`We do an all-AMPs RETRIEVE step from TESTCONS_T.CONKPI by way of an all-rows scan with a condition of ("TESTCONS_T.CONKPI.DTE_YRWK_NR_KPN = 200316") into Spool 1 (all_amps), which is redistributed by hash code to all AMPs. Then we do a SORT to partition Spool 1 by rowkey. The size of Spool 1 is estimated with high confidence to be 620,622 rows. The estimated time for this step is 11.12 seconds.`
	`5)`	`We do an all-AMPs MERGE into TESTCONS_T.CONKPI_NEW from Spool 1 (Last Use).`
	`6)`	`We spoil the parser's dictionary cache for the table.`
	`->`	`No rows are returned to the user as the result of statement 1.`

No mention of partitions. In fact, data has been REDISTRIBUTED, which I didn't expect or understand.

Also, I am running in ANSI mode.

Certainly if I read the table on week 15, thus

Select * from TESTCONS_T.CONKPI_NEW where DTE_YRWK_NR_KPN = 200315

I get the expected partition elimination :

`Explanation --------------------------------------------------`
	`1)`	`First, we lock a distinct TESTCONS_T."pseudo table" for read on a RowHash to prevent global deadlock for TESTCONS_T.CONKPI_NEW.`
	`2)`	`Next, we lock TESTCONS_T.CONKPI_NEW for read.`
	`3)`	`We do an all-AMPs RETRIEVE step from a single partition of TESTCONS_T.CONKPI_NEW with a condition of ("TESTCONS_T.CONKPI_NEW.DTE_YRWK_NR_KPN = 200315") with a residual condition of ("TESTCONS_T.CONKPI_NEW.DTE_YRWK_NR_KPN = 200315") into Spool 1 (group_amps), which is built locally on the AMPs. The size of Spool 1 is estimated with no confidence to be 195,383 rows. The estimated time for this step is 9.84 seconds.`
	`->`	`The contents of Spool 1 are sent back to the user as the result of statement 1. The total estimated time is 9.84 seconds.`

Also If I delete from only 1 partition, I also get an explain that performs partition elimination,

`Explanation --------------------------------------------------`
	`1)`	`First, we lock a distinct TESTCONS_T."pseudo table" for write on a RowHash to prevent global deadlock for TESTCONS_T.CONKPI_NEW.`
	`2)`	`Next, we lock TESTCONS_T.CONKPI_NEW for write.`
	`3)`	`We do an all-AMPs DELETE from a single partition of TESTCONS_T.CONKPI_NEW with a condition of ("TESTCONS_T.CONKPI_NEW.DTE_YRWK_NR_KPN = 200316") with a residual condition of ("TESTCONS_T.CONKPI_NEW.DTE_YRWK_NR_KPN = 200316").`
	`->`	`No rows are returned to the user as the result of statement 1.`

but even this took 24 seconds to remove 1.2 odd million rows (i.e. twice the 620k row inserts).

So ... have I partly misunderstood PPI's?!

Dave Clough
Database Designer
Database Design Group


	<Prev	Next>		<<First	<Prev	Next>	Last>>

Attachments

Library

Quick Reference