Re: Indexes vs statistics [From Christie, Jon: Thu, 29 Jan 2004 @ 21:56 GMT]

Message Posted: Thu, 29 Jan 2004 @ 21:56:42 GMT

Subj:

Re: Indexes vs statistics

From:

Christie, Jon

Aggregation cardinality is often hard to estimate. It is equal to the number of distinct values for the grouping columns. So if the set of grouping columns coincides with an index and you've collected statistics on the index, the optimizer can make a dead-on estimate. Otherwise, it's a crap shoot. We used to estimate that the aggregate cardinality was the square root of the number of rows going in. That's the geometric mean of the extreme possibilities; either 1 or N where N is the number of rows going in. But that estimate was often too low, and generally speaking, estimates that are too low get the optimizer in trouble more often than estimates that are too high. So the estimate was changed. I'm not 100% sure what the estimate is today, though I remember at one time it was 90% of the rows going in. That may still be the estimate, but I wouldn't count on it in every case.

Recommendation: collect stats on grouping columns. In other words, for GROUP BY X,Y,Z collect statistics on (X,Y,Z) even if they're not an index. Of course, if you've got something like GROUP BY T1.X, T2.Y, T3.Z there's not anything you can do.

Aggregation in a view causes "spooling." I'm sure you've heard about this. In the resolver (the parser phase prior to the optimizer) views (and derived tables) with a GROUP BY clause are resolved and then sent ahead to the optimizer which builds a plan to "materialize" the view (or derived table) and sends this plan (in the form of a tree) back to the resolver. In the original query, the view (or derived table) is replaced by a spool. When we get to the main optimizer phase, the optimizer has almost no information on what's inside the spool. It's a black box. This effects the confidence of estimates. I'm not sure if they become no confidence or low confidence, but they're certainly no higher.

In V2R5, derived tables are no longer automatically spooled. Instead, they are "handled like views." But since views with aggregation are spooled and since one of the primary uses of derived tables is to move aggregation up in a query or to do multiple levels of aggregation, you're still going to see spooling. But not as much as you used to.

My, my. I have rambled on. I hope you can make sense of what I said.

Attachments

Library

Quick Reference

Archives of the TeradataForum

Message Posted: Thu, 29 Jan 2004 @ 21:56:42 GMT