|
|
Archives of the TeradataForum
Message Posted: Wed, 01 Mar 2006 @ 12:21:45 GMT
Subj: | | Performance Diff using Union ALL and Separate Query |
|
From: | | Kanthadai Subramanian Suresh |
Hi All
Sorry again for such a big mail...
I have query A & query B which when I run separately runs in 3 mins and 1 mins. But when I combine the 2 queries using a union ALL it is
running for 30 mins.. Can anyone explain why is it so...
One more thing I tried was instead of Union I used a case statement. It also provides me the result only in 30 mins.
Only thing I can see consistently is the explain does show it will take 30 mins but not able to understand the logic behind it.
The main base table Event_party has got Primary Index ( Event_Id) .It has 180 million records
Collect Statistics has been done on Data_source_cd on Event_party.
The Filter table Data_sources_Filter has got primary index (Data_source_cd).It has 23 records in it. Stats has been collected in it.
The attribute defn Data_Source_cd in Event_party as well as in Filter_table are same.Both are Char(10)
Query A runs in 3 min ..output is 180 million rows
SELECT
TGT.Event_Id
,TGT.Party_Id
,TGT.Contact_Comment_Txt
,TGT.Party_Role_Cd
,TGT.Party_Event_Start_Dt
,TGT.Party_Event_End_Dt
,TGT.Ext_Identification_Type_Cd
,TGT.Your_Reference
,TGT.Our_Reference
FROM V_EVENT_PARTY TGT
WHERE TGT.DATA_SOURCE_CD IS NULL
Explanation -------------------------------------------------- | |
| 1) | First, we lock EVENT_PARTY for access.
| |
| 2) | Next, we do an all-AMPs RETRIEVE step from EVENT_PARTY by way of an all-rows scan with a condition of ("EVENT_PARTY.Data_Source_Cd IS
NULL") into Spool 1 (group_amps), which is built locally on the AMPs. The input table will not be cached in memory, but it is eligible for
synchronized scanning. The result spool file will not be cached in memory. The size of Spool 1 is estimated with high confidence to be
187,790,974 rows. The estimated time for this step is 30 minutes and 3 seconds.
| |
| 3) | Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request.
| |
| -> | The contents of Spool 1 are sent back to the user as the result of statement 1. The total estimated time is 30 minutes and 3 seconds.
| |
Query b runs in 1 mins output is 0 rows
SELECT
TGT.Event_Id
,TGT.Party_Id
,TGT.Contact_Comment_Txt
,TGT.Party_Role_Cd
, TGT.Start_Dt
, TGT.End_Dt
,TGT.Ext_Identification_Type_Cd
,TGT.Your_Reference
,TGT.Our_Reference
FROM V_EVENT_PARTY TGT
INNER JOIN DATA_SOURCES_FILTER B
ON coalesce(trim(TGT.DATA_SOURCE_CD),'') =
coalesce(trim(B.DATA_SOURCE_CD),'')
WHERE TGT.DATA_SOURCE_CD IS NOT NULL
Explanation -------------------------------------------------- | |
| 1) | 1) First, we lock a distinct "pseudo table" for read on a RowHash to prevent global deadlock for
| |
| 2) | Next, we lock Data_sources_Filter for read, and we lock EVENT_PARTY for access.
| |
| 3) | We do an all-AMPs RETRIEVE step from Data_sources_Filter by way of an all-rows scan with no residual conditions into Spool 2 (all_amps),
which is duplicated on all AMPs. The size of Spool 2 is estimated with high confidence to be 322 rows. The estimated time for this step is 0.03
seconds.
| |
| 4) | We do an all-AMPs JOIN step from Spool 2 (Last Use) by way of an all-rows scan, which is joined to EVENT_PARTY by way of an all-rows scan
with a condition of ("NOT (EVENT_PARTY.Data_Source_Cd IS NULL)"). Spool 2 and EVENT_PARTY are joined using a product join, with a join condition
of ("(( CASE WHEN (NOT (TRIM(BOTH FROM {RightTable}.Data_Source_Cd )IS NULL )) THEN (TRIM(BOTH FROM {RightTable}.Data_Source_Cd )) ELSE ('') END
))= (( CASE WHEN (NOT (TRIM(BOTH FROM {LeftTable}.DATA_SOURCE_CD )IS NULL )) THEN (TRIM(BOTH FROM {LeftTable}.DATA_SOURCE_CD )) ELSE ('') END
))"). The input table EVENT_PARTY will not be cached in memory, but it is eligible for synchronized scanning. The result goes into Spool 1
(group_amps), which is built locally on the AMPs. The result spool file will not be cached in memory. The size of Spool 1 is estimated with no
confidence to be 900,613,873 rows. The estimated time for this step is 2 hours and 24 minutes.
| |
| 5) | Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request.
| |
| -> | The contents of Spool 1 are sent back to the user as the result of statement 1. The total estimated time is 2 hours and 24 minutes.
| |
Query C When I combine both the queris using Union ALL it runs for 30 mins
SELECT
TGT.Event_Id
,TGT.Party_Id
,TGT.Contact_Comment_Txt
,TGT.Party_Role_Cd
,TGT.Party_Event_Start_Dt
,TGT.Party_Event_End_Dt
,TGT.Ext_Identification_Type_Cd
,TGT.Your_Reference
,TGT.Our_Reference
FROM V_EVENT_PARTY TGT
WHERE TGT.DATA_SOURCE_CD IS NULL
union all
SELECT
TGT.Event_Id
,TGT.Party_Id
,TGT.Contact_Comment_Txt
,TGT.Party_Role_Cd
, TGT.Start_Dt
, TGT.End_Dt
,TGT.Ext_Identification_Type_Cd
,TGT.Your_Reference
,TGT.Our_Reference
FROM V_EVENT_PARTY TGT
INNER JOIN DATA_SOURCES_FILTER B
ON coalesce(trim(TGT.DATA_SOURCE_CD),'') =
coalesce(trim(B.DATA_SOURCE_CD),'')
WHERE TGT.DATA_SOURCE_CD IS NOT NULL
Explanation -------------------------------------------------- | |
| 1) | First, we lock a distinct "pseudo table" for read on a
RowHash to prevent global deadlock for Data_Sources_Filter
| |
| 2) | Next, we lock Data_Sources_Filter for read, and we lock
EVENT_PARTY for access.
| |
| 3) | We execute the following steps in parallel.
| |
| |
| 1) | We do an all-AMPs RETRIEVE step from EVENT_PARTY by way of an all-rows scan with a condition of ("EVENT_PARTY.Data_Source_Cd IS NULL")
into Spool 1 (group_amps), which is built locally on the AMPs. The input table will not be cached in memory, but it is eligible for synchronized
scanning. The result spool file will not be cached in memory. The size of Spool 1 is estimated with high confidence to be 187,790,974 rows. The
estimated time for this step is 30 minutes and 3 seconds.
| | |
| |
| 2) | We do an all-AMPs RETRIEVE step from BR_FCMDM.B by way of an all-rows scan with no residual conditions into Spool 2 (all_amps), which is
duplicated on all AMPs. The size of Spool 2 is estimated with high confidence to be 322 rows. The estimated time for this step is 0.03 seconds.
| | |
| 4) | We do an all-AMPs JOIN step from Spool 2 (Last Use) by way of an all-rows scan, which is joined to EVENT_PARTY by way of an all-rows scan
with a condition of ("NOT (EVENT_PARTY.Data_Source_Cd IS NULL)"). Spool 2 and EVENT_PARTY are joined using a product join, with a join condition
of ("(( CASE WHEN (NOT (TRIM(BOTH FROM {RightTable}.Data_Source_Cd )IS NULL )) THEN (TRIM(BOTH FROM {RightTable}.Data_Source_Cd )) ELSE ('') END
))= (( CASE WHEN (NOT (TRIM(BOTH FROM {LeftTable}.DATA_SOURCE_CD )IS NULL )) THEN (TRIM(BOTH FROM {LeftTable}.DATA_SOURCE_CD )) ELSE ('') END
))"). The input table EVENT_PARTY will not be cached in memory, but it is eligible for synchronized scanning. The result goes into Spool 1
(group_amps), which is built locally on the AMPs. The result spool file will not be cached in memory. The size of Spool 1 is estimated with no
confidence to be 1,088,404,847 rows. The estimated time for this step is 2 hours and 28 minutes.
| |
| 5) | Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request.
| |
| -> | The contents of Spool 1 are sent back to the user as the result of statement 1. The total estimated time is 2 hours and 58 minutes.
| |
Query used with Case instead of UNION ALL
SELECT
TGT.Event_Id
,TGT.Party_Id
,TGT.Contact_Comment_Txt
,TGT.Party_Role_Cd
,Case when Tgt.Data_source_cd is null then TGT.Party_Event_Start_Dt
else TGT.Start_Dt
end
,Case when Tgt.Data_source_cd is null then Tgt.Party_Event_End_Dt
else TGT.End_Dt
end
,TGT.Ext_Identification_Type_Cd
,TGT.Your_Reference
,TGT.Our_Reference
FROM V_EVENT_PARTY TGT
INNER JOIN DATA_SOURCES_FILTER B
ON coalesce(trim(TGT.DATA_SOURCE_CD),'') =
coalesce(trim(B.DATA_SOURCE_CD),'')
Explanation -------------------------------------------------- | |
| 1) | First, we lock a distinct "pseudo table" for read on a RowHash to prevent global deadlock for Data_Sources_Filter
| |
| 2) | Next, we lock Data_Sources_Filter for read, and we lock EVENT_PARTY for access.
| |
| 3) | We do an all-AMPs RETRIEVE step from Data_Sources_Filter by way of an all-rows scan with no residual conditions into Spool 2 (all_amps),
which is duplicated on all AMPs. The size of Spool 2 is estimated with high confidence to be 322 rows. The estimated time for this step is 0.03
seconds.
| |
| 4) | We do an all-AMPs JOIN step from Spool 2 (Last Use) by way of an all-rows scan, which is joined to EVENT_PARTY by way of an all-rows scan
with no residual conditions. Spool 2 and EVENT_PARTY are joined using a product join, with a join condition of ("(( CASE WHEN (NOT (TRIM(BOTH
FROM {RightTable}.Data_Source_Cd )IS NULL )) THEN (TRIM(BOTH FROM {RightTable}.Data_Source_Cd )) ELSE ('') END ))= (( CASE WHEN (NOT (TRIM(BOTH
FROM {LeftTable}.DATA_SOURCE_CD )IS NULL )) THEN (TRIM(BOTH FROM {LeftTable}.DATA_SOURCE_CD )) ELSE ('') END ))"). The input table EVENT_PARTY
will not be cached in memory, but it is eligible for synchronized scanning. The result goes into Spool 1 (group_amps), which is built locally on
the AMPs. The result spool file will not be cached in memory. The size of Spool 1 is estimated with no confidence to be 900,613,873 rows. The
estimated time for this step is 2 hours and 24 minutes.
| |
| 5) | Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request.
| |
| -> | The contents of Spool 1 are sent back to the user as the result of statement 1. The total estimated time is 2 hours and 24 minutes.
| |
Thanks & Regards
Cheers
Suresh
| |