|
|
Archives of the TeradataForum
Message Posted: Wed, 09 May 2001 @ 20:55:03 GMT
Subj: | | Re: Bad Lumps |
|
From: | | Frank Martinez |
Here's a couple more SQL's for the same purpose:
/*****
* NAME: Distribution - Record Count
* PURPOSE: This calculates where records would be placed on the AMPs,
based on a specific key.
*****/
SELECT HASHAMP(HASHBUCKET(HASHROW(OrgCode, AgencyCode, FiscalYear, ObjectCode))) AS AmpNo,
COUNT (*)
FROM budget
GROUP BY 1
ORDER BY 1;
/*****
* NAME: Distribution - Mode
* PURPOSE: This determines the "mode" of the distribution of a candidate index.
* NOTE: The mode is the score in the population that occurs most frequently.
* (Internet Glossary of Statistical Terms: http://www.animatedsoftware.com/statglos/statglos.htm)
*****/
SELECT NumberOfRecords, Occurs
FROM (SELECT NumberOfRecords, COUNT(*) AS Occurs
FROM (SELECT COUNT(*) AS NumberOfRecords
FROM budget
GROUP BY OrgCode, AgencyCode, FiscalYear, ObjectCode) AS Temp
GROUP BY 1) AS Temp2
QUALIFY RANK(Occurs) = 1;
/*****
* NAME: ATM Statistics
* PURPOSE:
*****/
SELECT MAX(TheCount) AS MaxRecs,
AVG(TheCount) AS AvgRecs,
COUNT(*) AS DistinctVals
FROM (SELECT COUNT(*) AS TheCount
FROM budget
GROUP BY OrgCode, AgencyCode, FiscalYear, ObjectCode) AS Temp;
The mode, max and number of distinct values are used in (what used to be called) the ATM process (gosh, I'm old). The max and mode
should be (current rule of thumb) less than 100, and the number of distinct values high. Oh well, somebody else do the lecture on how
Teradata works, I'm tired.
iv
Frank C. Martinez IV
| |