|
Archives of the TeradataForumMessage Posted: Sat, 22 Oct 2005 @ 14:56:08 GMT
The basic answer to your question is that the first request must redistribute all of the values from the join column to the same AMP so that they can join to the column of the other table. If this second column is not the PI of that table, the rows of that table must also be redistributed to the same AMP as the first table rows. This is because an AMP must make the comparsion of the values from different tables and therefore, they must all be together. In essence, it treats the join columns like the PI for the join and they are held in SPOOL during the join. From your question, if you truly need the duplicates for this join, you need more SPOOL space. The GROUP BY aggregates the duplicate values on each AMP - making them unique on the AMP. Then, redistributes at most, one unique value per AMP. Then, once all of the same values from all of the AMPs have been put into SPOOL, GROUP BY goes and aggregates them again to retain only one value and therefore less SPOOL is required. If you were using DISTINCT and had a lot of dups, it is likely to SPOOL error too with your data. This is because DISTINCT redistributes every single value from each row as it is read. Once the same values from all of the AMPs are all together on an AMP, the values are sorted (normally more than one value on each AMP)and then rows are eliminated. You can see both of these behaviors by reading an EXPLAIN. Hope this helps, Michael Larkins
| ||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||
Copyright 2016 - All Rights Reserved | ||||||||||||||||||||||||||||||||||||||||||||||||
Last Modified: 15 Jun 2023 | ||||||||||||||||||||||||||||||||||||||||||||||||