Home Page for the TeradataForum
 

Archives of the TeradataForum

Message Posted: Tue, 28 Feb 2006 @ 08:35:36 GMT


     
  <Prev Next>  
<<First
<Prev
Next> Last>>  


Subj:   Performance problem with View Layering
 
From:   Kanthadai Subramanian Suresh

Hi All

First I am Sorry for the long mail .

I am facing with the performance problem when users are trying to access data . Initially we dint have any filters to the Users view which has been added recently to filter out records based on the value in Column Data_source_cd

I have got the target EDW table with the following structure .

     CREATE SET TABLE account_event ,NO FALLBACK ,
          NO BEFORE JOURNAL,
          NO AFTER JOURNAL,
          CHECKSUM = DEFAULT
          (
           Event_Id DECIMAL(18,0) NOT NULL,
           Account_Num VARCHAR(255) CHARACTER SET LATIN NOT CASESPECIFIC NOT NULL,
           Account_Modifier_Num CHAR(18) CHARACTER SET LATIN NOT CASESPECIFIC NOT NULL,
           Account_Event_Type_Cd CHAR(10) CHARACTER SET LATIN NOT CASESPECIFIC,
           Start_Dt DATE FORMAT 'YYYY-MM-DD',
           End_Dt DATE FORMAT 'YYYY-MM-DD',
           Data_Source_Cd CHAR(10) CHARACTER SET LATIN NOT CASESPECIFIC,
           Record_Deleted_Flag BYTEINT NOT NULL DEFAULT 0 ,
           BusinessDate DATE FORMAT 'YYYY-MM-DD',
           Ins_Txf_BatchID INTEGER,
           Upd_Txf_BatchID INTEGER)
     PRIMARY INDEX ( Event_Id );

This table has got 15million records

The table is accessed by a view V_Account_Event which is a one to one on table (Account_Event) without any filters.

     REPLACE VIEW V_ACCOUNT_EVENT   (
           Event_Id
          , Account_Num
          , Account_Modifier_Num
          , Account_Event_Type_Cd
          , Start_Dt
          , End_Dt
          , Data_Source_Cd
          , Record_Deleted_Flag
          , BusinessDate
          , Ins_Txf_BatchID
          , Upd_Txf_BatchID

       ) AS
      LOCKING TABLE ACCOUNT_EVENT             FOR ACCESS  SELECT
           Event_Id
          , Account_Num
          , Account_Modifier_Num
          , Account_Event_Type_Cd
          , Start_Dt
          , End_Dt
          , Data_Source_Cd
          , Record_Deleted_Flag
          , BusinessDate
          , Ins_Txf_BatchID
          , Upd_Txf_BatchID

      FROM ACCOUNT_EVENT;

For the downstream datamarts to provide only the relevant data from the target EDW we have got a filter table which will hold only the required values

     CREATE SET TABLE data_sources_filter ,NO FALLBACK ,
          NO BEFORE JOURNAL,
          NO AFTER JOURNAL,
          CHECKSUM = DEFAULT
          (
           S_NO INTEGER,
           DATA_SOURCE_CD VARCHAR(30) CHARACTER SET LATIN NOT CASESPECIFIC,
           DESCRIPTION VARCHAR(255) CHARACTER SET LATIN NOT CASESPECIFIC,
           START_DT DATE FORMAT 'YY/MM/DD',
           END_DT DATE FORMAT 'YY/MM/DD')
     PRIMARY INDEX ( DATA_SOURCE_CD );

This table has got only 23 records

The users access the data by the following view defn..

     REPLACE VIEW VM_ACCOUNT_EVENT  (
      Event_Id
      ,Account_Num
      ,Account_Modifier_Num
      ,Account_Event_Type_Cd

      ) AS
     SELECT
      TGT.Event_Id
      ,TGT.Account_Num
      ,TGT.Account_Modifier_Num
      ,TGT.Account_Event_Type_Cd

     FROM V_ACCOUNT_EVENT  TGT
     INNER JOIN DATA_SOURCES_FILTER B
     ON COALESCE(TRIM(TGT.DATA_SOURCE_CD),'') =
     COALESCE(TRIM(B.DATA_SOURCE_CD),'' )
     AND TGT.End_Dt is null
     ;

The problem is when I run a query of Count(*) in the Target table directly it gives me the result in 3 mins.

Whereas When I execute the same using the VM_ACCOUNT_EVENT view to query it takes 20 mins to provide me the result.

Can you pls help me to find a way to correct this problem.

     Sel count(*) from Account_Event ;   Result in 3 mins

Explain plan when I query against the table

Explanation
--------------------------------------------------
 
  1)First, we lock a distinct "pseudo table" for read on a RowHash to prevent global deadlock for account_event.  
  2)Next, we lock account_event for read.  
  3)We do an all-AMPs SUM step to aggregate from account_event by way of an all-rows scan with no residual conditions. Aggregate Intermediate Results are computed globally, then placed in Spool 3. The input table will not be cached in memory, but it is eligible for synchronized scanning. The size of Spool 3 is estimated with high confidence to be 1 row. The estimated time for this step is 2 minutes and 54 seconds.  
  4)We do an all-AMPs RETRIEVE step from Spool 3 (Last Use) by way of an all-rows scan into Spool 1 (group_amps), which is built locally on the AMPs. The size of Spool 1 is estimated with high confidence to be 1 row. The estimated time for this step is 0.03 seconds.  
  5)Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request.  
  ->The contents of Spool 1 are sent back to the user as the result of statement 1.  


     Sel count(*) from VM_ACCOUNT_EVENT   ; Result in 18mins.

Explain plan when I query using the view

Explanation
--------------------------------------------------
 
  1)First, we lock a distinct "pseudo table" for read on a RowHash to prevent global deadlock for BR_FCMDM.B.  
  2)Next, we lock BR_FCMDM.B for read, and we lock ACCOUNT_EVENT for access.  
  3)We do an all-AMPs RETRIEVE step from BR_FCMDM.B by way of an all-rows scan with no residual conditions into Spool 4 (all_amps), which is duplicated on all AMPs. The size of Spool 4 is estimated with high confidence to be 322 rows. The estimated time for this step is 0.03 seconds.  
  4)We do an all-AMPs JOIN step from Spool 4 (Last Use) by way of an all-rows scan, which is joined to ACCOUNT_EVENT by way of an all-rows scan with a condition of ("ACCOUNT_EVENT.End_Dt IS NULL"). Spool 4 and ACCOUNT_EVENT are joined using a product join, with a join condition of ("(( CASE WHEN (NOT (TRIM(BOTH FROM {RightTable}.Data_Source_Cd )IS NULL )) THEN (TRIM(BOTH FROM {RightTable}.Data_Source_Cd )) ELSE ('') END ))= (( CASE WHEN (NOT (TRIM(BOTH FROM {LeftTable}.DATA_SOURCE_CD )IS NULL )) THEN (TRIM(BOTH FROM {LeftTable}.DATA_SOURCE_CD )) ELSE ('') END ))"). The input table BR_TEDW.T0504_ACCOUNT_EVENT will not be cached in memory, but it is eligible for synchronized scanning. The result goes into Spool 3 (all_amps), which is built locally on the AMPs. The result spool file will not be cached in memory. The size of Spool 3 is estimated with no confidence to be 234,340,692 rows. The estimated time for this step is 5 minutes and 55 seconds.  
  5)We do an all-AMPs SUM step to aggregate from Spool 3 (Last Use) by way of an all-rows scan. Aggregate Intermediate Results are computed globally, then placed in Spool 5. The size of Spool 5 is estimated with high confidence to be 1 row. The estimated time for this step is 41.56 seconds.  
  6)We do an all-AMPs RETRIEVE step from Spool 5 (Last Use) by way of an all-rows scan into Spool 1 (group_amps), which is built locally on the AMPs. The size of Spool 1 is estimated with high confidence to be 1 row. The estimated time for this step is 0.03 seconds.  
  7)Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request.  
  ->The contents of Spool 1 are sent back to the user as the result of statement 1.  


I have tried by creating Secondary Index on the table for Data_source_cd. This is also not working .

Any suggestions or way to solve this problem is very much appreciated.

Thanks in advance


Cheers

Suresh



     
  <Prev Next>  
<<First
<Prev
Next> Last>>  
 
 
 
 
 
 
 
 
  
  Top Home Privacy Feedback  
 
 
Copyright for the TeradataForum (TDATA-L), Manta BlueSky    
Copyright 2016 - All Rights Reserved    
Last Modified: 15 Jun 2023