Home Page for the TeradataForum
 

Archives of the TeradataForum

Message Posted: Wed, 09 Mar 2005 @ 16:25:58 GMT


     
  <Prev Next>  
<<First
<Prev
Next> Last>>  


Subj:   Multi Column Statistics Puzzle
 
From:   Ferry, Craig

Does anyone know why this may happen? I have a simple query below. I ran an explain on the query and then ran the query. I was then trying to improve the speed, so I collected statistics on both columns in the WHERE clause. I then ran the explain again, but instead of being "high confidence" as in the first explain (pre STATS), it came back "low confidence". The number of rows that were returned was 3 (as in the second explain plan), but I am confused as to the confidence level.


TIA

Craig

        -------------------------------------------------------------

     QUERY

     SELECT *
     FROM dss_tables.sales_detail
     WHERE branch_id = 1116
     AND invoice_no = '782722'

        -------------------------------------------------------------

     BEFORE

     1:24 response

     Explanation
       1) First, we lock a distinct dss_tables."pseudo table" for read on a
          RowHash to prevent global deadlock for dss_tables.sales_detail.
       2) Next, we lock dss_tables.sales_detail for read.
       3) We do an all-AMPs RETRIEVE step from dss_tables.sales_detail by
          way of index # 8 "dss_tables.sales_detail.branch_id = 1116" with a
          residual condition of ("dss_tables.sales_detail.invoice_no =
          782722") into Spool 1 (group_amps), which is built locally on the
          AMPs.  The input table will not be cached in memory, but it is
          eligible for synchronized scanning.  The size of Spool 1 is
          estimated with high confidence to be 108 rows.  The estimated time
          for this step is 2 minutes and 12 seconds.
       4) Finally, we send out an END TRANSACTION step to all AMPs involved
          in processing the request.
       -> The contents of Spool 1 are sent back to the user as the result of
          statement 1.  The total estimated time is 2 minutes and 12 seconds.

        -------------------------------------------------------------

     COLLECT STATISTICS ON dss_tables.sales_detail COLUMN (branch_id,
     invoice_no)

     AFTER w/ multi column statistics

     :50 response

     Explanation
       1) First, we lock a distinct dss_tables."pseudo table" for read on a
          RowHash to prevent global deadlock for dss_tables.sales_detail.
       2) Next, we lock dss_tables.sales_detail for read.
       3) We do an all-AMPs RETRIEVE step from dss_tables.sales_detail by
          way of index # 8 "dss_tables.sales_detail.branch_id = 1116" with a
          residual condition of ("dss_tables.sales_detail.invoice_no =
          782722") into Spool 1 (group_amps), which is built locally on the
          AMPs.  The input table will not be cached in memory, but it is
          eligible for synchronized scanning.  The size of Spool 1 is
          estimated with low confidence to be 3 rows.  The estimated time
          for this step is 2 minutes and 12 seconds.
       4) Finally, we send out an END TRANSACTION step to all AMPs involved
          in processing the request.
       -> The contents of Spool 1 are sent back to the user as the result of
          statement 1.  The total estimated time is 2 minutes and 12 seconds.

        -------------------------------------------------------------

Craig
__________________________
Craig Ferry
Sr Database Administrator
WESCO Distribution, Inc.



     
  <Prev Next>  
<<First
<Prev
Next> Last>>  
 
 
 
 
 
 
 
 
  
  Top Home Privacy Feedback  
 
 
Copyright for the TeradataForum (TDATA-L), Manta BlueSky    
Copyright 2016 - All Rights Reserved    
Last Modified: 15 Jun 2023