Archives of the TeradataForum

Message Posted: Wed, 09 May 2001 @ 20:55:03 GMT


	<Prev	Next>		<<First	<Prev	Next>	Last>>

Subj:		Re: Bad Lumps

From:		Frank Martinez

Here's a couple more SQL's for the same purpose:

/*****
 * NAME:        Distribution - Record Count
 * PURPOSE:     This calculates where records would be placed on the AMPs,
                based on a specific key.
 *****/
SELECT HASHAMP(HASHBUCKET(HASHROW(OrgCode, AgencyCode, FiscalYear, ObjectCode))) AS AmpNo,
       COUNT (*)
  FROM budget
 GROUP BY 1
 ORDER BY 1;

/*****
 * NAME:        Distribution  - Mode
 * PURPOSE:     This determines the "mode" of the distribution of a candidate index.
 * NOTE:        The mode is the score in the population that occurs most frequently.
 *              (Internet Glossary of Statistical Terms: http://www.animatedsoftware.com/statglos/statglos.htm)
 *****/
SELECT NumberOfRecords, Occurs
  FROM (SELECT NumberOfRecords, COUNT(*) AS Occurs
          FROM (SELECT COUNT(*) AS NumberOfRecords
                  FROM budget
                 GROUP BY OrgCode, AgencyCode, FiscalYear, ObjectCode) AS Temp
         GROUP BY 1) AS Temp2
QUALIFY RANK(Occurs) = 1;

/*****
 * NAME:        ATM Statistics
 * PURPOSE:
 *****/
SELECT MAX(TheCount)     AS MaxRecs,
       AVG(TheCount)     AS AvgRecs,
       COUNT(*)          AS DistinctVals
  FROM (SELECT COUNT(*)  AS TheCount
          FROM budget
         GROUP BY OrgCode, AgencyCode, FiscalYear, ObjectCode) AS Temp;

The mode, max and number of distinct values are used in (what used to be called) the ATM process (gosh, I'm old). The max and mode should be (current rule of thumb) less than 100, and the number of distinct values high. Oh well, somebody else do the lecture on how Teradata works, I'm tired.

Frank C. Martinez IV


	<Prev	Next>		<<First	<Prev	Next>	Last>>

Attachments

Library

Quick Reference