Home Page for the TeradataForum
 

Archives of the TeradataForum

Message Posted: Thu, 31 May 2012 @ 13:42:48 GMT


     
  <Prev Next>   <<First <Prev Next> Last>>  


Subj:   Re: Why incorrect stats are worse than not collecting stats?
 
From:   Dave Wellman

Hi Sughesh,

Firstly, this is not so bad as it used to be. This change came about with TD12.

Before TD12, if there were statistics on a column then they were accepted as correct. However, this could lead to a situation such as:

- the current date is during September

- a query includes selection which looks for rows between 1st August and 31st August (i.e. the previous month)

- the statistics on the relevant column include a maximum value of 31st July (because statistics have not been refreshed)

- the optimizer assumes that there are very few rows (I think one per AMP but not sure about that) that meet the selection criteria although inreality there may be many thousands or even hundreds of thousands.


In this scenario, it is very possible that the plan generated by the optimizer would be 'less than optimal'. Common situations that I've seen is that the spool file built in such a situation is then product joined with something else, the actual processing then taking hours to complete.

With TD12 and upwards, the optimizer now compares any statistics with random amp samples and decides (again, don't know how) if the statistics are out of date. If they are out of date then the statistics are ignored.

So, providing that random amp samples will give you a good plan then no stats is probably as good as stale stats.

Having said that, accurate, up to date stats is one of my markers for a well administered Teradata system.


Cheers,

Dave

Ward Analytics Ltd - Information in motion (www.ward-analytics.com)



     
  <Prev Next>   <<First <Prev Next> Last>>  
 
 
 
 
 
 
 
 
  
  Top Home Privacy Feedback  
 
 
Copyright for the TeradataForum (TDATA-L), Manta BlueSky    
Copyright 2016 - All Rights Reserved    
Last Modified: 15 Jun 2023