Home Page for the TeradataForum
 

Archives of the TeradataForum

Message Posted: Tue, 17 Oct 2006 @ 10:29:48 GMT


     
  <Prev Next>   <<First <Prev
Next>
Last>>
 


Subj:   Re: Way to Copy Statistics.
 
From:   Dittrich, Klaus

Eric,

here's a suggestion how to copy Statistics from one table to another (the same way how TSET might work), using the Help-Statistics command

It's still in prototype status, but already working in most cases. I added a detailed example so you may see if this is what you are searching for (but I'm sorry that it got a rather long posting this way...)

The "Help Statistics" Command returns results in 3 different forms:

- Numeric Index Column (the most simple one)

- Non-numeric Index Column, any Single-Column-Stats

- Multi-Column Stats or Indexes (each column value is stored in a separate field)


The results can be re-ordered and used as input for a "Collect Stats" command to transfer the statistics.

I will show the method for a simple example (a primary index and a single-numeric-column, only 1 value => one 1 interval), followed by some remarks about the more complex types.

a) Defining an Example

     create table dwh_meta_ent.bla (a integer, b integer) primary index (a);
     insert dwh_meta_ent.bla values (1001,1002);
     collect stats dwh_meta_ent.bla index (a);
     collect stats dwh_meta_ent.bla column b;

     create table dwh_meta_ent.bla2 as dwh_meta_ent.bla with no data;

b) Query DBC.StatsX

Query this View to get the Index-Type (any Index or Multi-Column-Stats or NULL), the Columnlist and the Field-Count; this information is needed to construct the correct Collect-Stats-Command and decide which kind of stats we have found.

c) Help Stats for each existing Statistics

     help stats dwh_meta_ent.bla2 index (a);
     help stats dwh_meta_ent.bla2 column b;

        Date   Time   Number of Rows   Number of Nulls   Number of
     Intervals   Number of Uniques   Numeric   Sampled   Sampled Percent
     Version   Min Value   Mode Value   Mode Frequency   Max Value
     Mode Value   Mode Frequency   Non-Modal Values   Non-Modal Rows
        06/10/17   08:36:42   1   0   1   1
     Y   0   0   2   1001   1001   1   1001   1001
     1   0   0

        Date   Time   Number of Rows   Number of Nulls   Number of
     Intervals   Number of Uniques   Numeric   Sampled   Sampled Percent
     Version   Min Value   Mode Value   Mode Frequency   Max Value
     Mode Value   Mode Frequency   Non-Modal Values   Non-Modal Rows
        06/10/17   08:36:42   1   0   1   1
     Y   0   0   2   1002   1002   1   1002   1002
     1   0   0

In this case, the return set contains the following information:

     #  0 Date
     #  1 Time
     #  2 Number of Rows
     #  3 Number of Nulls
     #  4 Number of Intervals
     #  5 Number of Uniques
     #  6 Numeric
     #  7 Sampled
     #  8 Sampled Percent
     #  9 Version

     # 10 Min Value
     # 11 Mode Value
     # 12 Mode Frequency

     # 13 Max Value
     # 14 Mode Value
     # 15 Mode Frequency
     # 16 Non-Modal Values
     # 17 Non-Modal Rows

For other types, a Field-Count is added after the Version; for each Stats interval, a block of five fields is added, i.e. fields 13-17 are repeated.

d) Collect Stats

This information is now reordered for the Collect-Statistics-Values statement:

     1. Number of Nulls
     2. Number of Intervals
     3. Min-Value
     4. Mode-Value
     5. Mode Frequency
     6. Number of Uniques
     7. Number of Rows

     8. Max-Value
     9. Mode-Value
     10. Mode Frequency
     11. Non-Modal Values
     12. Non-Modal Rows

The Block 8-12 is repeated for each interval; all other input fields are not used.

In this example:

     collect statistics DWH_META_ENT.bla2 column b values
     (0,1,1002,1002,1,1,1,1002,1002,1,0,0);

     collect statistics DWH_META_ENT.bla2 index (a) values
     (0,1,1001,1001,1,1,1,1001,1001,1,0,0);

Result:

     help stats dwh_meta_ent.bla2 index (a);
     help stats dwh_meta_ent.bla2 column b;

        Date   Time   Number of Rows   Number of Nulls   Number of
     Intervals   Number of Uniques   Numeric   Sampled   Sampled Percent
     Version   Min Value   Mode Value   Mode Frequency   Max Value
     Mode Value   Mode Frequency   Non-Modal Values   Non-Modal Rows
        06/10/17   08:36:42   1   0   1   1
     Y   0   0   2   1001   1001   1   1001   1001
     1   0   0

        Date   Time   Number of Rows   Number of Nulls   Number of
     Intervals   Number of Uniques   Numeric   Sampled   Sampled Percent
     Version   Min Value   Mode Value   Mode Frequency   Max Value
     Mode Value   Mode Frequency   Non-Modal Values   Non-Modal Rows
        06/10/17   08:36:42   1   0   1   1
     Y   0   0   2   1002   1002   1   1002   1002
     1   0   0

I.e. the Stats were copied 1:1, only Date/Time are refreshed, and it only took 1 second instead of recollecting the Stats for minutes/hours (we already used it in tests for 1-Terabyte tables, and it worked very fine).

e) Remarks

1. As noted, the result set of Help-Stats is different for different Stats types; in all but this simple case, another field FIELDCOUNT is added (but not needed).

2. Non-numeric datatypes (char, date) have to be sent with quotes of course

3. For Multi-Column-Stats (and Multi-Column-Indexes), each column value is stored in a separate field, and has to be sent back the same way, i.e. all following fields are shifted (I didn't finish the complete algorithm yet, but it's possible), .

4. WARNING! This is a undocumented feature, and I got these results only by some testing, getting

- in best case a "invalid value list" error,

- in worst case a "please do not resubmit", so try it on a test machine first (or the Demo), and use at own risk!


5. Last but not least, many thanks to Dieter (of course!), for giving me the initial idea for all this during a training session some time ago... :-)

Currently, I'm working (when there's time) on a simple Perl program thatjust gets a source and a target table and transfers all available Stats (where possible) via ODBC to the target; if somebody's interested, I may post it here (once it's finished...)


Cheers

Klaus

--
Klaus Dittrich
Fachreferent Global Data Warehouse Systems
Vodafone D2 GmbH



     
  <Prev Next>   <<First <Prev
Next>
Last>>
 
 
 
 
 
 
 
 
 
  
  Top Home Privacy Feedback  
 
 
Copyright for the TeradataForum (TDATA-L), Manta BlueSky    
Copyright 2016 - All Rights Reserved    
Last Modified: 15 Jun 2023