Re: Resolving AMP CPU Skew Error [From frank.c.martinez: Wed, 26 Dec 2012 @ 20:11 GMT]

Message Posted: Wed, 26 Dec 2012 @ 20:11:29 GMT

Subj:

Re: Resolving AMP CPU Skew Error

From:

frank.c.martinez

Do you mean skewing? Oh, you'd like to determine distribution? Well, here's some queries as a Christmas present (or we can argue about them:

DATABASE DB_XXXX; /***** * PURPOSE: The number of records is needed in order to compare the * to the number of distinct values. This only needs to be * run once, whereas the other queries can be run for each * candidate index. *****/ SELECT SUM(CASE WHEN XXX_ACCT_ID IS NULL THEN 1 ELSE 0 END) AS IsNull, COUNT(*) FROM T_XXX_PYE; /***** * NAME: ATM Statistics * PURPOSE: This calculates the maximum and average number of records per value * and the number of distinct values, for use in index selection analysis. * The average and max should be low (< 200 if possible) and the number of * distinct values should be high (some large percentage of the number of rows). *****/; SELECT AVG(TheCount) AS AvgRecs, MAX(TheCount) AS MaxRecs, COUNT(*) AS DistinctVals FROM (SELECT COUNT(*) AS TheCount FROM T_XXX_PYE GROUP BY XXX_ACCT_ID) AS Temp; /***** * NAME: Distribution - Mode * PURPOSE: This determines the "mode" of the distribution of a specific index. * NOTE: The mode is the score in the population that occurs most frequently. * (Internet Glossary of Statistical Terms: http://www.animatedsoftware.com/statglos/sgmode.htm) * It should be low (as the average and max). *****/ SELECT NumberOfRecords, Occurs FROM (SELECT NumberOfRecords, COUNT(*) AS Occurs FROM (SELECT COUNT(*) AS NumberOfRecords FROM T_XXX_PYE GROUP BY XXX_ACCT_ID) AS Temp GROUP BY 1) AS Temp2 QUALIFY RANK(Occurs) = 1; /***** * NAME: Distribution - Record Count by AMP * PURPOSE: This calculates where records would be placed on the AMPs, based on a specific index. * It can show uneven distribution caused by a specific value or by no value (AMP 0). *****/ SELECT HASHAMP(HASHBUCKET(HASHROW(XXX_ACCT_ID))) AS AmpNo, COUNT (*) FROM T_XXX_PYE GROUP BY 1 ORDER BY 1; /***** * NAME: Distribution - Skew Factor * PURPOSE: This calculates percentage of record skew between the AMP's, using both * the WinDDI definition of skew, or Mike Buckland's definition. Both of these * were originally defined using CurrentPerm usage. * NOTE: For the fact table of a Star Schema, the second skew should be less than 25%. *****/ SELECT 100 - (100 * AVG(NumberOfRecords)/MAX(NumberOfRecords) (DECIMAL(18,2))) AS WinDDI, (100 * MAX(NumberOfRecords) (DECIMAL(18,2)))/MIN(NumberOfRecords) - 100 AS Buckland FROM (SELECT HASHAMP(HASHBUCKET(HASHROW(XXX_ACCT_ID))) AS AmpNo, COUNT (*) AS NumberOfRecords FROM T_XXX_PYE GROUP BY 1) AS T1; /***** * NAME: Distribution - Exceeds 200 Summary * PURPOSE: This shows the number of records with 200 or more records per specific index. * It can be used to compare to the total number of records as a percentage. *****/ SELECT SUM(NumberOfRecords) FROM (SELECT COUNT(*) AS NumberOfRecords FROM T_XXX_PYE GROUP BY XXX_ACCT_ID HAVING NumberOfRecords > 199) AS T1;

Just some fairly simple stuff I've used for a while. But if Dieter had written them, they'd be elegant (Happy Holidays, Dieter!).

Attachments

Library

Quick Reference

Archives of the TeradataForum

Message Posted: Wed, 26 Dec 2012 @ 20:11:29 GMT