Home Page for the TeradataForum

Archives of the TeradataForum

Message Posted: Wed, 26 Dec 2012 @ 20:11:29 GMT

  <Prev Next>   <<First <Prev Next> Last>>  

Subj:   Re: Resolving AMP CPU Skew Error
From:   frank.c.martinez

Do you mean skewing? Oh, you'd like to determine distribution? Well, here's some queries as a Christmas present (or we can argue about them:

      * PURPOSE: The number of records is needed in order to compare the
      *          to the number of distinct values.  This only needs to be
      *          run once, whereas the other queries can be run for each
      *          candidate index.
                   WHEN XXX_ACCT_ID   IS NULL
                      THEN 1
                      ELSE 0
                 END) AS IsNull,
       FROM T_XXX_PYE;

      * NAME:    ATM Statistics
      * PURPOSE: This calculates the maximum and average number of records per value
      *          and the number of distinct values, for use in index selection analysis.
      *          The average and max should be low (< 200 if possible) and the number of
      *          distinct values should be high (some large percentage of the number of rows).
     SELECT AVG(TheCount)    AS AvgRecs,
            MAX(TheCount)    AS MaxRecs,
            COUNT(*)         AS DistinctVals
       FROM (SELECT COUNT(*) AS TheCount
               FROM T_XXX_PYE
              GROUP BY XXX_ACCT_ID) AS Temp;

      * NAME:    Distribution  - Mode
      * PURPOSE: This determines the "mode" of the distribution of a specific index.
      * NOTE:    The mode is the score in the population that occurs most frequently.
      *          (Internet Glossary of Statistical Terms: http://www.animatedsoftware.com/statglos/sgmode.htm)
      *          It should be low (as the average and max).
     SELECT NumberOfRecords, Occurs
       FROM (SELECT NumberOfRecords, COUNT(*) AS Occurs
               FROM (SELECT COUNT(*) AS NumberOfRecords
                       FROM T_XXX_PYE
                      GROUP BY XXX_ACCT_ID) AS Temp
              GROUP BY 1) AS Temp2
     QUALIFY RANK(Occurs) = 1;

      * NAME:    Distribution - Record Count by AMP
      * PURPOSE: This calculates where records would be placed on the AMPs, based on a specific index.
      *          It can show uneven distribution caused by a specific value or by no value (AMP 0).
            COUNT (*)
      GROUP BY 1
      ORDER BY 1;

      * NAME:    Distribution - Skew Factor
      * PURPOSE: This calculates percentage of record skew between the AMP's, using both
      *          the WinDDI definition of skew, or Mike Buckland's definition.  Both of these
      *          were originally defined using CurrentPerm usage.
      * NOTE:    For the fact table of a Star Schema, the second skew should be less than 25%.
     SELECT 100 - (100 * AVG(NumberOfRecords)/MAX(NumberOfRecords) (DECIMAL(18,2))) AS WinDDI,
            (100 * MAX(NumberOfRecords) (DECIMAL(18,2)))/MIN(NumberOfRecords) - 100 AS Buckland
                    COUNT (*) AS NumberOfRecords
               FROM T_XXX_PYE
              GROUP BY 1) AS T1;

      * NAME:    Distribution - Exceeds 200 Summary
      * PURPOSE: This shows the number of records with 200 or more records per specific index.
      *          It can be used to compare to the total number of records as a percentage.
     SELECT SUM(NumberOfRecords)
       FROM (SELECT COUNT(*) AS NumberOfRecords
               FROM T_XXX_PYE
              GROUP BY XXX_ACCT_ID
             HAVING NumberOfRecords > 199) AS T1;

Just some fairly simple stuff I've used for a while. But if Dieter had written them, they'd be elegant (Happy Holidays, Dieter!).


  <Prev Next>   <<First <Prev Next> Last>>  
  Top Home Privacy Feedback  
Copyright for the TeradataForum (TDATA-L), Manta BlueSky    
Copyright 2016 - All Rights Reserved    
Last Modified: 28 Jun 2020