EXPLAIN cost estimates [From Pekka Kostamaa: Wed, 11 Jan 2001 @ 05:11 GMT]

Message Posted: Wed, 11 Jan 2001 @ 05:11:34 GMT

Subj:

EXPLAIN cost estimates

From:

Pekka Kostamaa

Recently there have been some posts and questions about the EXPLAIN cost estimates. These cost estimates are used by the optimizer, when exploring the alternative access paths and join orders to execute the query. The purpose of these cost estimates is for the optimizer to pick the optimal plan.

Internally, the optimizer costs three resources: CPU, Disk I/O, and the Interconnect (Bynet). The goal of the optimizer is to find a plan that minimizes the use of these three resources. This is different from some other databases, where the goal of the optimizer is to minimize the overall response time. Minimizing resource use also maximizes the overall workload throughput, which is not necessarily true for optimizers that minimize response time. This is another example of why Teradata is good for multi-user decision support applications.

These cost estimates are externally visible to the user by using the EXPLAIN. Teradata uses time as a unit. Some other databases use unitless values, often called resource units. This might have been a better choice by Teradata also, since the accuracy of the times will never be too good.

There are several reasons for this. First, the times are based on statistical information about the selectivities of the query. As many of you know, this can be difficult depending on the type of clause, and whether or not statistics are present on the columns and indexes the optimizer needs to consider. Second, the times assume an otherwise idle system. In other words, the optimizer only costs a single query, and does not take into account the workload on the system. This is not even possible, since the query is executed at a different time from when it is optimized. The workload on the system can and does change over time.

Some steps in the query plan show no cost estimates. This is because the optimizer typically does not have a choice for these steps, and, therefore, there is no need to cost the step. An example, which is not true in the newer versions of the Teradata system, but which is easy to explain, is the aggregation (aka SUM) step. Teradata used to only execute aggregation as the last step of a plan. Therefore, there was never a need to cost this step, just execute it always as the last step. Note that this is not true in the current releases of Teradata, where the aggreation steps can be moved around in the query tree. To be able to do this with any step, it has to have a cost associated with it.

Hope this helps,

Pekka Kostamaa

Attachments

Library

Quick Reference

Archives of the TeradataForum

Message Posted: Wed, 11 Jan 2001 @ 05:11:34 GMT