|
|
Archives of the TeradataForum
Message Posted: Wed, 26 Dec 2012 @ 20:11:29 GMT
Subj: | | Re: Resolving AMP CPU Skew Error |
|
From: | | frank.c.martinez |
Do you mean skewing? Oh, you'd like to determine distribution? Well, here's some queries as a Christmas present (or we can argue about
them:
DATABASE DB_XXXX;
/*****
* PURPOSE: The number of records is needed in order to compare the
* to the number of distinct values. This only needs to be
* run once, whereas the other queries can be run for each
* candidate index.
*****/
SELECT SUM(CASE
WHEN XXX_ACCT_ID IS NULL
THEN 1
ELSE 0
END) AS IsNull,
COUNT(*)
FROM T_XXX_PYE;
/*****
* NAME: ATM Statistics
* PURPOSE: This calculates the maximum and average number of records per value
* and the number of distinct values, for use in index selection analysis.
* The average and max should be low (< 200 if possible) and the number of
* distinct values should be high (some large percentage of the number of rows).
*****/;
SELECT AVG(TheCount) AS AvgRecs,
MAX(TheCount) AS MaxRecs,
COUNT(*) AS DistinctVals
FROM (SELECT COUNT(*) AS TheCount
FROM T_XXX_PYE
GROUP BY XXX_ACCT_ID) AS Temp;
/*****
* NAME: Distribution - Mode
* PURPOSE: This determines the "mode" of the distribution of a specific index.
* NOTE: The mode is the score in the population that occurs most frequently.
* (Internet Glossary of Statistical Terms: http://www.animatedsoftware.com/statglos/sgmode.htm)
* It should be low (as the average and max).
*****/
SELECT NumberOfRecords, Occurs
FROM (SELECT NumberOfRecords, COUNT(*) AS Occurs
FROM (SELECT COUNT(*) AS NumberOfRecords
FROM T_XXX_PYE
GROUP BY XXX_ACCT_ID) AS Temp
GROUP BY 1) AS Temp2
QUALIFY RANK(Occurs) = 1;
/*****
* NAME: Distribution - Record Count by AMP
* PURPOSE: This calculates where records would be placed on the AMPs, based on a specific index.
* It can show uneven distribution caused by a specific value or by no value (AMP 0).
*****/
SELECT HASHAMP(HASHBUCKET(HASHROW(XXX_ACCT_ID))) AS AmpNo,
COUNT (*)
FROM T_XXX_PYE
GROUP BY 1
ORDER BY 1;
/*****
* NAME: Distribution - Skew Factor
* PURPOSE: This calculates percentage of record skew between the AMP's, using both
* the WinDDI definition of skew, or Mike Buckland's definition. Both of these
* were originally defined using CurrentPerm usage.
* NOTE: For the fact table of a Star Schema, the second skew should be less than 25%.
*****/
SELECT 100 - (100 * AVG(NumberOfRecords)/MAX(NumberOfRecords) (DECIMAL(18,2))) AS WinDDI,
(100 * MAX(NumberOfRecords) (DECIMAL(18,2)))/MIN(NumberOfRecords) - 100 AS Buckland
FROM (SELECT HASHAMP(HASHBUCKET(HASHROW(XXX_ACCT_ID))) AS AmpNo,
COUNT (*) AS NumberOfRecords
FROM T_XXX_PYE
GROUP BY 1) AS T1;
/*****
* NAME: Distribution - Exceeds 200 Summary
* PURPOSE: This shows the number of records with 200 or more records per specific index.
* It can be used to compare to the total number of records as a percentage.
*****/
SELECT SUM(NumberOfRecords)
FROM (SELECT COUNT(*) AS NumberOfRecords
FROM T_XXX_PYE
GROUP BY XXX_ACCT_ID
HAVING NumberOfRecords > 199) AS T1;
Just some fairly simple stuff I've used for a while. But if Dieter had written them, they'd be elegant (Happy Holidays, Dieter!).
iv
| |