
Archives of the TeradataForumMessage Posted: Wed, 09 May 2001 @ 19:46:49 GMT
Hi, I think this refers to 'lumpy data' which is where you have a large variation in the number of rows per rowhash. For instance, some rowhash values may only have 1 or 2 rows, whilst others may have 100200 rows. This is often accompanied by skewed data, but it is possible for the table as a whole to be more or less evenly distributed ACROSS the amps, but WITHIN an amp there can be a large disparity between rowhashes when it comes to the number of data rows per rowhash. To show this, try the following query (assuming the col1, col2 is the primary index of the table concerned); Sel ampnbr (title 'amp number') ,max(rowcount) (title 'max rows per hash bucket') ,min(rowcount) (title 'min rows per hash bucket') ,sum(rowcount) (title 'rows on amp') from (sel hashamp(hashbucket(hashrow(col1,col2))),hashbucket(hashrow(col1,col2)),count(*) from < table > group by 1,2) as a (ampnbr,hashbucketnbr,rowcount) group by 1 order by 4 desc; This will give you a report with one line per amp which tells you the number of rows per amp for this table. Comparing the number for 'rows on amp' on the first line to the same number on the last line will give you an indication of how skewed your table is. In a perfect world, the first and last numbers in this column will be the same. Comparing the numbers for 'max rows per hash bucket' and 'min rows per hash bucket' on any single line will tell you how lumpy your data is. In a perfect world, these two numbers on any one line will be the same. If this shows you that you have lumpy data then solving it is a case of changing the primary index of the table. Basically, you've got a hashing problem, which means you have to change an index definition. If that doesn't make sense, feel free to contact me. Cheers, Dave
 
 
Copyright 2016  All Rights Reserved  
Last Modified: 28 Jun 2020  