|
Archives of the TeradataForumMessage Posted: Fri, 22 Aug 2008 @ 16:12:42 GMT
Your analysis is correct on the space side. However, your example does not really give an case where this would happen as it is not skewed enough. The outliers (ones with more rows than average) usually never happen on the same amp and get spread evenly across the amps. Thus you usually aren't skewed overall. You will also have other tables in the system that will also skew to different amps, overall you may still be pretty even. The 'rule of thumb' I use is that you can have 1000 to 5000 rows per PI value and you will not really notice the skew on inserts and space. This number will vary based on the size of the system. The real rule on performance degradation is when the total rows per PI cannot fit into one block. So if your blocksize is 130560 and the average row length is 200. You can have about 652 rows per pi value before you start to see performance problems. In my experience I find that you can go quite a bit higher as the performance gain in not redistributing the table on joins outweighs the Duplicate row check on loads. Also the data usually stay pretty even across the system so each amp is still doing abut the same workload. You really need to evaluate in your situations. You need to verify these things then decide which is most important to you. 1) space usage 2) load performance 2) query performance with this one you need to look at your mix of queries to choose the best choice. From what you have described I would choose the PI on Order Line as Order_ID Thanks, Bob Diehl
| ||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||
Copyright 2016 - All Rights Reserved | ||||||||||||||||||||||||||||||||||||||||||||||||
Last Modified: 15 Jun 2023 | ||||||||||||||||||||||||||||||||||||||||||||||||