Multi Column Statistics Puzzle [From Ferry, Craig: Wed, 09 Mar 2005 @ 16:25 GMT]

Message Posted: Wed, 09 Mar 2005 @ 16:25:58 GMT

Subj:

Multi Column Statistics Puzzle

From:

Ferry, Craig

Does anyone know why this may happen? I have a simple query below. I ran an explain on the query and then ran the query. I was then trying to improve the speed, so I collected statistics on both columns in the WHERE clause. I then ran the explain again, but instead of being "high confidence" as in the first explain (pre STATS), it came back "low confidence". The number of rows that were returned was 3 (as in the second explain plan), but I am confused as to the confidence level.

TIA

Craig

------------------------------------------------------------- QUERY SELECT * FROM dss_tables.sales_detail WHERE branch_id = 1116 AND invoice_no = '782722' ------------------------------------------------------------- BEFORE 1:24 response Explanation 1) First, we lock a distinct dss_tables."pseudo table" for read on a RowHash to prevent global deadlock for dss_tables.sales_detail. 2) Next, we lock dss_tables.sales_detail for read. 3) We do an all-AMPs RETRIEVE step from dss_tables.sales_detail by way of index # 8 "dss_tables.sales_detail.branch_id = 1116" with a residual condition of ("dss_tables.sales_detail.invoice_no = 782722") into Spool 1 (group_amps), which is built locally on the AMPs. The input table will not be cached in memory, but it is eligible for synchronized scanning. The size of Spool 1 is estimated with high confidence to be 108 rows. The estimated time for this step is 2 minutes and 12 seconds. 4) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request. -> The contents of Spool 1 are sent back to the user as the result of statement 1. The total estimated time is 2 minutes and 12 seconds. ------------------------------------------------------------- COLLECT STATISTICS ON dss_tables.sales_detail COLUMN (branch_id, invoice_no) AFTER w/ multi column statistics :50 response Explanation 1) First, we lock a distinct dss_tables."pseudo table" for read on a RowHash to prevent global deadlock for dss_tables.sales_detail. 2) Next, we lock dss_tables.sales_detail for read. 3) We do an all-AMPs RETRIEVE step from dss_tables.sales_detail by way of index # 8 "dss_tables.sales_detail.branch_id = 1116" with a residual condition of ("dss_tables.sales_detail.invoice_no = 782722") into Spool 1 (group_amps), which is built locally on the AMPs. The input table will not be cached in memory, but it is eligible for synchronized scanning. The size of Spool 1 is estimated with low confidence to be 3 rows. The estimated time for this step is 2 minutes and 12 seconds. 4) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request. -> The contents of Spool 1 are sent back to the user as the result of statement 1. The total estimated time is 2 minutes and 12 seconds. -------------------------------------------------------------

Craig
__________________________
Craig Ferry
Sr Database Administrator
WESCO Distribution, Inc.

Attachments

Library

Quick Reference

Archives of the TeradataForum

Message Posted: Wed, 09 Mar 2005 @ 16:25:58 GMT