Archives of the TeradataForum

Message Posted: Wed, 23 Feb 2005 @ 17:59:27 GMT


	<Prev	Next>		<<First	<Prev	Next>	Last>>

Subj:		Re: Optimal PI

From:		Frank C. Martinez IV

After sacrificing a goat on the keyboard, you might want to run this little set of queries I've collected. What you're looking for is:

1) If it's a third normal form type table -

1st - Distribution

2nd - Access

2) If it's a fact type table in a star schema -

1st - Access

2nd - Distribution.

     LOCKING Table_in_Question FOR ACCESS
     SELECT COUNT(*)
       FROM Table_in_Question;

     /*****
      * NAME:    ATM Statistics
      * PURPOSE: This calculates the maximum and average number of records per value
      *          and the number of distinct values, for use in index selection analysis.
      *          The 1st two should be as close to below the rule-of-thumb value of 100 or 200
      *          (depending on what era in which you were raised as a T-dba).
      *          The last should be as distinct as possible.
      *****/
     LOCKING Table_in_Question FOR ACCESS
     SELECT AVG(TheCount)       AS AvgRecs,
            MAX(TheCount)       AS MaxRecs,
            COUNT(*)            AS DistinctVals
       FROM (SELECT COUNT(*)    AS TheCount
               FROM Table_in_Question
              GROUP BY PI_Column1, ..., PI_ColumnN) AS Temp;

     /*****
      * NAME:    Distribution  - Mode
      * PURPOSE: This determines the "mode" of the distribution of a specific index.
      * NOTE:    The mode is the score in the population that occurs most frequently.
      *          (Internet Glossary of Statistical Terms:  )
      *          Mode should also be under the rule-of-thumb number.
      *****/
     LOCKING Table_in_Question FOR ACCESS
     SELECT NumberOfRecords, Occurs
       FROM (SELECT NumberOfRecords, COUNT(*) AS Occurs
               FROM (SELECT COUNT(*) AS NumberOfRecords
                       FROM Table_in_Question
                      GROUP BY PI_Column1, ..., PI_ColumnN) AS Temp
              GROUP BY 1) AS Temp2
     QUALIFY RANK(Occurs) = 1;

     /*****
      * NAME:    Distribution - Record Count by AMP
      * PURPOSE: This calculates where records would be placed on the AMPs, based on a specific index.
      *          Good to use to make a histogram.  Avoid lumpiness!
      *****/
     LOCKING Table_in_Question FOR ACCESS
     SELECT HASHAMP(HASHBUCKET(HASHROW(PI_Column1, ..., PI_ColumnN))) AS AmpNo,
            COUNT (*)
       FROM Table_in_Question
      GROUP BY 1
      ORDER BY 1;

     /*****
      * NAME:    Distribution - Skew Factor
      * PURPOSE: This calculates percentage of record skew between the AMP's, using both
      *          the WinDDI definition of skew, or Mike Buckland's definition. Bot of these
      *          were originally defined using CurrentPerm usage.
      * NOTE:    For the fact table of a Star Schema, the second skew should be less than 25.0,
      *          based on discussions with Mike Buckland, aka Teradata Starman.
      *****/
     LOCKING Table_in_Question FOR ACCESS
     SELECT 100 - (100 * AVG(NumberOfRecords)/MAX(NumberOfRecords) (DECIMAL(18,2))) AS WinDDI,
            (100 * MAX(NumberOfRecords) (DECIMAL(18,2)))/MIN(NumberOfRecords) - 100 AS Buckland
       FROM (SELECT HASHAMP(HASHBUCKET(HASHROW(PI_Column1, ..., PI_ColumnN))) AS AmpNo,
                    COUNT (*) AS NumberOfRecords
               FROM Table_in_Question
              GROUP BY 1) AS T1;

     /*****
      * NAME:    Distribution - Exceeds 200 Summary
      * PURPOSE: This shows the number of records with 200 or more records per specific index.
      *          It can be used to compare to the total number of records as a percentage.
      *          Don't even bother to run if you don't get above the magic number in the first
      *          query.
      *****/
     LOCKING Table_in_Question FOR ACCESS
     SELECT SUM(NumberOfRecords)
       FROM (SELECT COUNT(*) AS NumberOfRecords
               FROM Table_in_Question
              GROUP BY PI_Column1, ..., PI_ColumnN
             HAVING NumberOfRecords > 199) AS T1;

Anyway, there's a whole section in the manuals on index selection. If the PI's never joined on, it'll never get any use for access, so it'd better be the only column that distributes well.

(Donning helmet for flying bricks...)

con respeto,


	<Prev	Next>		<<First	<Prev	Next>	Last>>

Attachments

Library

Quick Reference