Archives of the TeradataForum

Message Posted: Fri, 23 Aug 2002 @ 08:00:07 GMT


	<Prev	Next>		<<First	<Prev	Next>	Last>>

Subj:		Re: PI and statistics

From:		Hiroshi Takenaka

Hi,

After my last posting, some people gave me advices. I also did further experiments and figured out:

* If I didn't collect stats on selection columns, the optimizer seems to choose 10% of total records as default estimated # of rows selected.

(The optimizer knows the exact number of total records if I collect stats on PI, or it does random sampling if there is no stats on PI)

* If I collect stats on the selection column, the optimizer seems to choose / <# of different values in the col> instead of 10% of total records as default guess. Everytime when I selected a non-existent value, the optimizer used that value.

* Even if I select a value which actually exists in the column, somehow the optimizer occasionally falls back to the default.

In my test data, with 83 out of 143 different values the optimizer estimates # of selected rows almost accurately, so it seems to use stats on that column. But with other 60 values, it falls back to the default: / <# of different values>.

Those 60 values seem to be values with small # of records, but not the 60 values with least records. It seems that those values are choosed somewhat randomly (depending on its PI?).

I'll be happy if the optimizer will make a bit better guess than /<# of different values>. Any further info will appreciate.

Regards.

--
Hiroshi Takenaka


	<Prev	Next>		<<First	<Prev	Next>	Last>>

Attachments

Library

Quick Reference