Home Page for the TeradataForum
 

Archives of the TeradataForum

Message Posted: Fri, 22 Aug 2008 @ 16:12:42 GMT


     
  <Prev Next>   <<First <Prev Next> Last>>  


Subj:   Re: Skewed Table - Space usage v/s AMP Colocation
 
From:   Diehl, Robert

Your analysis is correct on the space side. However, your example does not really give an case where this would happen as it is not skewed enough. The outliers (ones with more rows than average) usually never happen on the same amp and get spread evenly across the amps. Thus you usually aren't skewed overall. You will also have other tables in the system that will also skew to different amps, overall you may still be pretty even.

The 'rule of thumb' I use is that you can have 1000 to 5000 rows per PI value and you will not really notice the skew on inserts and space. This number will vary based on the size of the system. The real rule on performance degradation is when the total rows per PI cannot fit into one block. So if your blocksize is 130560 and the average row length is 200. You can have about 652 rows per pi value before you start to see performance problems. In my experience I find that you can go quite a bit higher as the performance gain in not redistributing the table on joins outweighs the Duplicate row check on loads. Also the data usually stay pretty even across the system so each amp is still doing abut the same workload. You really need to evaluate in your situations.

You need to verify these things then decide which is most important to you.

1) space usage

2) load performance

2) query performance with this one you need to look at your mix of queries to choose the best choice.


From what you have described I would choose the PI on Order Line as Order_ID


Thanks,

Bob Diehl
Travelocity



     
  <Prev Next>   <<First <Prev Next> Last>>  
 
 
 
 
 
 
 
 
  
  Top Home Privacy Feedback  
 
 
Copyright for the TeradataForum (TDATA-L), Manta BlueSky    
Copyright 2016 - All Rights Reserved    
Last Modified: 15 Jun 2023