|
Archives of the TeradataForumMessage Posted: Wed, 29 Jun 2005 @ 17:35:22 GMT
Swapnil, If you have the data already loaded you can run a query like the following to see how evenly it will distribute. /* just replace a.session_id with your column and databasename.tablename with the database and table that the data resides in */ SELECT HASHAMP (HASHBUCKET (HASHROW (A.session_id))) ,COUNT(*) FROM databasename.tablename A GROUP BY 1 ORDER BY 1 ; /* (A.FLT_DT || A.FLT_NR || A.DPT_ARP_CD || A.ARR_ARP_CD ))) a concatenated key like this can be used for compound index checks */ Also find the highest occurrence of the column Select account_id, count(*) from tablename Group by 1 order by 2 desc Having count(*) > 500 Basically, you will start to see performance degradation if the rows for a single Primary index value will no fit into 1 block. I have found the threshold to be at 2-5 times this rule before major performance problems occur. To figure out how many rows will fit in a block divide the blocksize by the average row length. To find the block size of the table look at the table definition for the Datablocksize. DATABLOCKSIZE = 130560 BYTES If the table does not have an explicit datablocksize look at the file system DBS CONTROL 3. PermDBSize = 127(sectors) 127 sectors = 130560 bytes. This is most common setting for Teradata V2R4.1 and above. Look in the books for other combinations if you have something else defined. Determining the average record length is much more complicated due to row overhead, compression, varchar and PPI factors. So don't have time to detail it. I believe one of the Teradata manuals details how to calculate average row length. So an example, if databablocksize = 130560 and average row length of 100 130560/100 = 1305 rows per block This means you could have 1,305 rows with the same primary index before you would start to experience some performance degradation. In my experience, you would have severe performance degradations on some activity with 5,000 or more rows with the same primary index in this situation. The degradation occurs on inserting data into the table, because of duplicate row checks. Often referred to a NUPI DUP problem. Two ways to alleviate duplicate row checks is to use Unique secondary index or MULTISET tables. These do not stop or change the data skew. This NUPI duplicate problem also often occurs when data has to be redistributed by not nearly unique column/columns. For this try influencing plan with stats on the column so the plan will not redistribute data. Also queries that use the 'offending' primary index rows can be impacted depending on the request. I hope this clears up this issue as not many understood it. Just correcting this type of issued alone will really improve performance for the whole system. Thanks, Bob Diehl
| ||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||
Copyright 2016 - All Rights Reserved | ||||||||||||||||||||||||||||||||||||||||||||||||
Last Modified: 15 Jun 2023 | ||||||||||||||||||||||||||||||||||||||||||||||||