Archives of the TeradataForum

Message Posted: Thu, 05 Feb 2004 @ 17:57:48 GMT


	<Prev	Next>		<<First	<Prev	Next>	Last>>

Subj:		A weird and unusual circumstance...

From:		Frank C. Martinez IV

Hello all,

We have a 400+ million row table, which we were analyzing to see if we could add some compression. So the stats are recent and it has the following definition and indicies:

CREATE SET TABLE GDYR_EDW.INV ,NO FALLBACK ,
     NO BEFORE JOURNAL,
     NO AFTER JOURNAL
     (
      FACILITY_ID CHAR(4) CHARACTER SET LATIN NOT CASESPECIFIC TITLE 'FacilityID' NOT NULL,
      MATL_ID CHAR(18) CHARACTER SET LATIN NOT CASESPECIFIC TITLE 'MaterialID' NOT NULL,
      DAY_DT DATE FORMAT 'yyyy-mm-dd' TITLE 'DayDT' NOT NULL,
      DEF_QTY DECIMAL(15,3) TITLE 'DeferredQTY' NOT NULL COMPRESS 0.000 ,
      BACK_ORDER_QTY DECIMAL(15,3) TITLE 'BackorderedQTY' NOT NULL COMPRESS 0.000 ,
      NET_BACK_ORDER_QTY DECIMAL(15,3) TITLE 'NetBackOrderedQTY' NOT NULL COMPRESS 0.000 ,
      COMMIT_QTY DECIMAL(15,3) TITLE 'CommitedQTY' NOT NULL COMPRESS 0.000,
      UN_COMMIT_QTY DECIMAL(15,3) TITLE 'UnCommittedQTY' NOT NULL COMPRESS 0.000 ,
      TOT_QTY DECIMAL(15,3) TITLE 'TotalQTY' NOT NULL COMPRESS 0.000 ,
      AVAIL_TO_PROM_QTY DECIMAL(15,3) TITLE 'AvailableToPromise' NOT NULL COMPRESS 0.000 ,
      STO_IN_PROS_OUT_QTY DECIMAL(15,3) TITLE 'STOInprocessOutQTY' NOT NULL COMPRESS 0.000 ,
      STO_IN_PROS_IN_QTY DECIMAL(15,3) TITLE 'STOInprocessInQTY' NOT NULL COMPRESS 0.000 ,
      IN_TRANS_QTY DECIMAL(15,3) TITLE 'InTransitQTY' NOT NULL COMPRESS 0.000 ,
      OPEN_FACT_UNIT_QTY DECIMAL(15,3) TITLE 'OpenFactoryUnitsQTY' NOT NULL COMPRESS 0.000 ,
      FACT_SHR DECIMAL(15,3) TITLE 'FactoryShare' NOT NULL COMPRESS 0.000,
      IN_PROS_QTY DECIMAL(15,3) TITLE 'InProcessQTY' NOT NULL COMPRESS 0.000 ,
      CUST_SAFE_STK_QTY DECIMAL(15,3) TITLE 'CustomerSafetyStockQTY' COMPRESS 0.000 ,
      RSVR_QTY DECIMAL(15,3) TITLE 'ReservedQTY' NOT NULL COMPRESS 0.000 ,
      QUAL_INSP_QTY DECIMAL(15,3) TITLE 'QualityInspectionQTY' NOT NULL COMPRESS 0.000 ,
      BLOCKED_STK_QTY DECIMAL(15,3) TITLE 'BlockedStockQTY' NOT NULL COMPRESS 0.000 ,
      RSTR_QTY DECIMAL(15,3) TITLE 'RestrictedQTY' NOT NULL COMPRESS 0.000,
      STK_RET_QTY DECIMAL(15,3) TITLE 'StockReturnsQTY' NOT NULL COMPRESS 0.000 ,
      SAFE_STK_QTY DECIMAL(15,3) TITLE 'SafetyStockQTY' NOT NULL COMPRESS 0.000 ,
      SRC_SYS_ID BYTEINT TITLE 'SrcSysID' NOT NULL,
      EDW_JOB_ID INTEGER TITLE 'EdwJobID' NOT NULL)
PRIMARY INDEX ( FACILITY_ID ,MATL_ID ,DAY_DT )
INDEX ( DAY_DT );

04/01/31        19:09:47              449    FACILITY_ID
04/01/31        19:14:58           72,177    MATL_ID
04/01/31        19:15:07              520    DAY_DT
04/01/31        19:17:18            7,063    FACT_SHR
04/01/31        19:19:27                2    SRC_SYS_ID
04/01/31        19:35:34      447,181,594    FACILITY_ID,MATL_ID,DAY_DT

When we do the following simple query, it takes the whole table, puts in into spool and then does a sample afterwards:

SELECT  day_dt
FROM    gdyr_edw.inv
sample 10 ;

3) We do an all-AMPs RETRIEVE step from gdyr_edw.inv by way of a traversal of index # 4 without accessing the base table with no residual conditions into Spool 2, which is built locally on the AMPs. The input table will not be cached in memory, but it is eligible for synchronized scanning. The result spool file will not be cached in memory. The size of Spool 2 is estimated with high confidence to be 447,181,594 rows. The estimated time for this step is 1 hour and 3 minutes.

4) We do an all-AMPs SAMPLING step from Spool 2 (Last Use) by way of an all-rows scan into Spool 1, which is built locally on the AMPs. Samples are specified as a number of rows.

whereas, when we just sample on any other colum without the index, we get an explain that shows the sample from the table directly:

SELECT  edw_job_id
FROM    gdyr_edw.inv
sample 10;

3) We do an all-AMPs SAMPLING step from gdyr_edw.inv by way of an all-rows scan with no residual conditions into Spool 1, which is built locally on the AMPs. Samples are specified as a number of rows.

Which of course, runs a whole lot faster (yes, we tested it, of course) than the first. We got around this by writing the first query as:

SELECT  day_dt
FROM    gdyr_edw.inv
group by 1
sample 10;

3) We do a SUM step to aggregate from gdyr_edw.inv by way of a traversal of index # 4 without accessing the base table with no residual conditions, and the grouping identifier in field 1027. Aggregate Intermediate Results are computed globally, then placed in Spool 4. The input table will not be cached in memory, but it is eligible for synchronized scanning. The size of Spool 4 is estimated with low confidence to be 520 rows.

4) We do an all-AMPs SAMPLING step from Spool 4 (Last Use) by way of an all-rows scan into Spool 1, which is built locally on the AMPs. Samples are specified as a number of rows.

Of course, the first query takes forever, the last is even quicker than the second, because it only goes out to the index. But why is the first transversing the index and then loading the whole table into spool? Is it lost? Did it get confused? Does it just not like me? Anyway, please, all you brilliant Teradatyls, please help.


	<Prev	Next>		<<First	<Prev	Next>	Last>>

Attachments

Library

Quick Reference