Home Page for the TeradataForum
 

Archives of the TeradataForum

Message Posted: Tue, 16 Dec 2008 @ 13:02:56 GMT


     
  <Prev Next>   <<First <Prev Next> Last>>  


Subj:   Re: Performance related issues in full statistics
 
From:   Joerg Mutsch

Hi Rit,

I am glad to help. The example I provided you is a full collecting statistics on the table. If you use these commands you will always get a full statistic information on your column or column combination. So after the process is finished the optimizer knows exactly the number of distinct values.

Sampling of Statistics is only collecting partially the statistics of a column or column combination on your table to give the optimizer at least a 'feeling' about the distinct values of your columns and their distributioin among the AMPS

However statistics are important to provide the optimizer informatioin to build the most optimized access plan to your table esp. when you join this table with other tables.

The command for collecting only sample statistics would be also slightly different: collect statistics using sample .....

This is what the documentation says (taken from Data Definition Statements):

to scan a system-selected percentage of table rows rather than scanning all rows for a table.

You can only specify this clause if you also specify an explicit column or index.

In other words, you cannot specify a USING SAMPLE clause for a standard recollection of statistics on implicitly specified column and index sets (see "Collecting Statistics When No COLUMN or INDEX Clause Is Specified" on page 1306 for details about recollecting statistics).

You cannot specify a USING SAMPLE clause with global temporary tables, hash indexes, or join indexes.

You cannot specify a USING SAMPLE clause if any column_name in the specified COLUMN set is a member of the partitioning column set for a table defined with a partitioned primary index. This restriction does not apply to collecting sampled statistics on a specified INDEX column set. If sampled statistics have already been collected on such a column and you recollect statistics on it, the system collects full statistics on the column: it does not use sampling to gather the new statistics.

This option is ignored for single-column statistics requests for the PARTITION column of a PPI table. Instead, the system automatically increases the percentage to 100 and collects full statistics.

You should only use this option for unique or highly singular columns and indexes, or only on extremely large tables. In this case, extremely large means having cardinalities in the tens of billions of rows. See "Reducing the Cost of Collecting Statistics by Sampling" on page 1297 for more information.


I never collected only samples to be honest. Based on the documentation a percentage need to be defined within the system to improve the performance on that. But dont ask me where this will be set and what's the default value, since I have no clue about that.


Regards

Joerg



     
  <Prev Next>   <<First <Prev Next> Last>>  
 
 
 
 
 
 
 
 
  
  Top Home Privacy Feedback  
 
 
Copyright for the TeradataForum (TDATA-L), Manta BlueSky    
Copyright 2016 - All Rights Reserved    
Last Modified: 15 Jun 2023