Home Page for the TeradataForum
 

Archives of the TeradataForum

Message Posted: Wed, 01 Mar 2006 @ 12:21:45 GMT


     
  <Prev Next>  
<<First
<Prev
Next>
Last>>
 


Subj:   Performance Diff using Union ALL and Separate Query
 
From:   Kanthadai Subramanian Suresh

Hi All

Sorry again for such a big mail...

I have query A & query B which when I run separately runs in 3 mins and 1 mins. But when I combine the 2 queries using a union ALL it is running for 30 mins.. Can anyone explain why is it so...

One more thing I tried was instead of Union I used a case statement. It also provides me the result only in 30 mins.

Only thing I can see consistently is the explain does show it will take 30 mins but not able to understand the logic behind it.

The main base table Event_party has got Primary Index ( Event_Id) .It has 180 million records

Collect Statistics has been done on Data_source_cd on Event_party.

The Filter table Data_sources_Filter has got primary index (Data_source_cd).It has 23 records in it. Stats has been collected in it.

The attribute defn Data_Source_cd in Event_party as well as in Filter_table are same.Both are Char(10)

Query A runs in 3 min ..output is 180 million rows

     SELECT
      TGT.Event_Id
      ,TGT.Party_Id
      ,TGT.Contact_Comment_Txt
      ,TGT.Party_Role_Cd
      ,TGT.Party_Event_Start_Dt
      ,TGT.Party_Event_End_Dt
      ,TGT.Ext_Identification_Type_Cd
      ,TGT.Your_Reference
      ,TGT.Our_Reference

     FROM V_EVENT_PARTY  TGT
     WHERE TGT.DATA_SOURCE_CD IS NULL

Explanation
--------------------------------------------------
 
  1)First, we lock EVENT_PARTY for access.  
  2)Next, we do an all-AMPs RETRIEVE step from EVENT_PARTY by way of an all-rows scan with a condition of ("EVENT_PARTY.Data_Source_Cd IS NULL") into Spool 1 (group_amps), which is built locally on the AMPs. The input table will not be cached in memory, but it is eligible for synchronized scanning. The result spool file will not be cached in memory. The size of Spool 1 is estimated with high confidence to be 187,790,974 rows. The estimated time for this step is 30 minutes and 3 seconds.  
  3)Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request.  
  ->The contents of Spool 1 are sent back to the user as the result of statement 1. The total estimated time is 30 minutes and 3 seconds.  


Query b runs in 1 mins output is 0 rows

     SELECT
      TGT.Event_Id
      ,TGT.Party_Id
      ,TGT.Contact_Comment_Txt
      ,TGT.Party_Role_Cd
      , TGT.Start_Dt
      , TGT.End_Dt
      ,TGT.Ext_Identification_Type_Cd
      ,TGT.Your_Reference
      ,TGT.Our_Reference

     FROM V_EVENT_PARTY  TGT
     INNER JOIN DATA_SOURCES_FILTER B
     ON coalesce(trim(TGT.DATA_SOURCE_CD),'') =
     coalesce(trim(B.DATA_SOURCE_CD),'')
     WHERE TGT.DATA_SOURCE_CD IS NOT NULL

Explanation
--------------------------------------------------
 
  1)1) First, we lock a distinct "pseudo table" for read on a RowHash to prevent global deadlock for  
  2)Next, we lock Data_sources_Filter for read, and we lock EVENT_PARTY for access.  
  3)We do an all-AMPs RETRIEVE step from Data_sources_Filter by way of an all-rows scan with no residual conditions into Spool 2 (all_amps), which is duplicated on all AMPs. The size of Spool 2 is estimated with high confidence to be 322 rows. The estimated time for this step is 0.03 seconds.  
  4)We do an all-AMPs JOIN step from Spool 2 (Last Use) by way of an all-rows scan, which is joined to EVENT_PARTY by way of an all-rows scan with a condition of ("NOT (EVENT_PARTY.Data_Source_Cd IS NULL)"). Spool 2 and EVENT_PARTY are joined using a product join, with a join condition of ("(( CASE WHEN (NOT (TRIM(BOTH FROM {RightTable}.Data_Source_Cd )IS NULL )) THEN (TRIM(BOTH FROM {RightTable}.Data_Source_Cd )) ELSE ('') END ))= (( CASE WHEN (NOT (TRIM(BOTH FROM {LeftTable}.DATA_SOURCE_CD )IS NULL )) THEN (TRIM(BOTH FROM {LeftTable}.DATA_SOURCE_CD )) ELSE ('') END ))"). The input table EVENT_PARTY will not be cached in memory, but it is eligible for synchronized scanning. The result goes into Spool 1 (group_amps), which is built locally on the AMPs. The result spool file will not be cached in memory. The size of Spool 1 is estimated with no confidence to be 900,613,873 rows. The estimated time for this step is 2 hours and 24 minutes.  
  5)Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request.  
  ->The contents of Spool 1 are sent back to the user as the result of statement 1. The total estimated time is 2 hours and 24 minutes.  


Query C When I combine both the queris using Union ALL it runs for 30 mins

     SELECT
      TGT.Event_Id
      ,TGT.Party_Id
      ,TGT.Contact_Comment_Txt
      ,TGT.Party_Role_Cd
      ,TGT.Party_Event_Start_Dt
      ,TGT.Party_Event_End_Dt
      ,TGT.Ext_Identification_Type_Cd
      ,TGT.Your_Reference
      ,TGT.Our_Reference

     FROM V_EVENT_PARTY  TGT
     WHERE TGT.DATA_SOURCE_CD IS NULL

     union all

     SELECT
      TGT.Event_Id
      ,TGT.Party_Id
      ,TGT.Contact_Comment_Txt
      ,TGT.Party_Role_Cd
      , TGT.Start_Dt
      , TGT.End_Dt
      ,TGT.Ext_Identification_Type_Cd
      ,TGT.Your_Reference
      ,TGT.Our_Reference

     FROM V_EVENT_PARTY  TGT
     INNER JOIN DATA_SOURCES_FILTER B
     ON coalesce(trim(TGT.DATA_SOURCE_CD),'') =
     coalesce(trim(B.DATA_SOURCE_CD),'')
     WHERE TGT.DATA_SOURCE_CD IS NOT NULL

Explanation
--------------------------------------------------
 
  1)First, we lock a distinct "pseudo table" for read on a RowHash to prevent global deadlock for Data_Sources_Filter  
  2)Next, we lock Data_Sources_Filter for read, and we lock EVENT_PARTY for access.  
  3)We execute the following steps in parallel.  
   
  1) We do an all-AMPs RETRIEVE step from EVENT_PARTY by way of an all-rows scan with a condition of ("EVENT_PARTY.Data_Source_Cd IS NULL") into Spool 1 (group_amps), which is built locally on the AMPs. The input table will not be cached in memory, but it is eligible for synchronized scanning. The result spool file will not be cached in memory. The size of Spool 1 is estimated with high confidence to be 187,790,974 rows. The estimated time for this step is 30 minutes and 3 seconds.
 
   
  2) We do an all-AMPs RETRIEVE step from BR_FCMDM.B by way of an all-rows scan with no residual conditions into Spool 2 (all_amps), which is duplicated on all AMPs. The size of Spool 2 is estimated with high confidence to be 322 rows. The estimated time for this step is 0.03 seconds.
 
  4)We do an all-AMPs JOIN step from Spool 2 (Last Use) by way of an all-rows scan, which is joined to EVENT_PARTY by way of an all-rows scan with a condition of ("NOT (EVENT_PARTY.Data_Source_Cd IS NULL)"). Spool 2 and EVENT_PARTY are joined using a product join, with a join condition of ("(( CASE WHEN (NOT (TRIM(BOTH FROM {RightTable}.Data_Source_Cd )IS NULL )) THEN (TRIM(BOTH FROM {RightTable}.Data_Source_Cd )) ELSE ('') END ))= (( CASE WHEN (NOT (TRIM(BOTH FROM {LeftTable}.DATA_SOURCE_CD )IS NULL )) THEN (TRIM(BOTH FROM {LeftTable}.DATA_SOURCE_CD )) ELSE ('') END ))"). The input table EVENT_PARTY will not be cached in memory, but it is eligible for synchronized scanning. The result goes into Spool 1 (group_amps), which is built locally on the AMPs. The result spool file will not be cached in memory. The size of Spool 1 is estimated with no confidence to be 1,088,404,847 rows. The estimated time for this step is 2 hours and 28 minutes.  
  5)Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request.  
  ->The contents of Spool 1 are sent back to the user as the result of statement 1. The total estimated time is 2 hours and 58 minutes.  


Query used with Case instead of UNION ALL

     SELECT
      TGT.Event_Id
      ,TGT.Party_Id
      ,TGT.Contact_Comment_Txt
      ,TGT.Party_Role_Cd

      ,Case when Tgt.Data_source_cd is null then TGT.Party_Event_Start_Dt
             else TGT.Start_Dt
             end

      ,Case when Tgt.Data_source_cd is null then Tgt.Party_Event_End_Dt
             else TGT.End_Dt
             end

      ,TGT.Ext_Identification_Type_Cd
      ,TGT.Your_Reference
      ,TGT.Our_Reference

     FROM V_EVENT_PARTY  TGT
     INNER JOIN DATA_SOURCES_FILTER B
     ON coalesce(trim(TGT.DATA_SOURCE_CD),'') =
     coalesce(trim(B.DATA_SOURCE_CD),'')

Explanation
--------------------------------------------------
 
  1)First, we lock a distinct "pseudo table" for read on a RowHash to prevent global deadlock for Data_Sources_Filter  
  2)Next, we lock Data_Sources_Filter for read, and we lock EVENT_PARTY for access.  
  3)We do an all-AMPs RETRIEVE step from Data_Sources_Filter by way of an all-rows scan with no residual conditions into Spool 2 (all_amps), which is duplicated on all AMPs. The size of Spool 2 is estimated with high confidence to be 322 rows. The estimated time for this step is 0.03 seconds.  
  4)We do an all-AMPs JOIN step from Spool 2 (Last Use) by way of an all-rows scan, which is joined to EVENT_PARTY by way of an all-rows scan with no residual conditions. Spool 2 and EVENT_PARTY are joined using a product join, with a join condition of ("(( CASE WHEN (NOT (TRIM(BOTH FROM {RightTable}.Data_Source_Cd )IS NULL )) THEN (TRIM(BOTH FROM {RightTable}.Data_Source_Cd )) ELSE ('') END ))= (( CASE WHEN (NOT (TRIM(BOTH FROM {LeftTable}.DATA_SOURCE_CD )IS NULL )) THEN (TRIM(BOTH FROM {LeftTable}.DATA_SOURCE_CD )) ELSE ('') END ))"). The input table EVENT_PARTY will not be cached in memory, but it is eligible for synchronized scanning. The result goes into Spool 1 (group_amps), which is built locally on the AMPs. The result spool file will not be cached in memory. The size of Spool 1 is estimated with no confidence to be 900,613,873 rows. The estimated time for this step is 2 hours and 24 minutes.  
  5)Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request.  
  ->The contents of Spool 1 are sent back to the user as the result of statement 1. The total estimated time is 2 hours and 24 minutes.  


Thanks & Regards


Cheers

Suresh



     
  <Prev Next>  
<<First
<Prev
Next>
Last>>
 
 
 
 
 
 
 
 
 
  
  Top Home Privacy Feedback  
 
 
Copyright for the TeradataForum (TDATA-L), Manta BlueSky    
Copyright 2016 - All Rights Reserved    
Last Modified: 15 Jun 2023