Archives of the TeradataForum
Message Posted: Fri, 14 Mar 2008 @ 14:44:29 GMT
Please forgive me for being a "Doubting Thomas" here, but as mentioned previously, the larger spool utilization for the "non-PI" query is the establishment and building of SPOOL5 using redistribution of some number of rows from their original AMP into the AMPs doing the join processing. This is all overhead work so that the join can be done. Where as the PI join does not need to read, redistribute, spool and sort the rows from PM_EDW_MSTR_D.d_ORGANIZATION. With the estimated number of rows going into spool from the join step, this seems to be a large number of rows (confirmed by your statement regarding spool utilization).
Seeing one step of a plan is like reading one sentence out of the Bible and using it to justify some behavior where as reading the sentences before it and after it point to the out of context and in correct conclusion. Everything is better understood when looking at the big picture instead of focusing on a single aspect.
Also, you seem to indicate that the actual runtime (not just 2 minutes versus 4 minutes in the one step we wee in the EXPLAIN) was longer for the PI join. Based on the above thoughts this is a little hard to understand. Is it possible that the non-PI join ran at a time of low system utilization and the PI join query experienced a lot of resource contention due to a high level of system utilization that could cause it to run longer? It would be good to have an apples to apples comparison.
The fastest join possible should be if both the organization code and the party id were the PI of the tables being joined. This is because neither table would need to be redistributed, spooled and sorted in order for the join step to execute directly from sorted table to sorted table.
I look forward to you providing more details on this query.
|Copyright 2016 - All Rights Reserved|
|Last Modified: 28 Jun 2020|