Archives of the TeradataForum
Message Posted: Thu, 18 Jun 1999 @ 00:47:19 GMT
I've just gotten access to a machine with V2R3 and look at what they've done to the EXPLAIN (there's more after the EXPLAIN):
Just a few details focusing on Step #2: The LC_CONT table has a total of 2,285,906 rows. The help statistics (and they are current) for this table:
Column Unique Values ------ ------------- LOC 5,168 CLU_NO 29,836 EXPLAIN LOCKING PDB.CLS_LOC_HST FOR ACCESS LOCKING PDB.SUBLN FOR ACCESS LOCKING PDB.CSTORES FOR ACCESS LOCKING PDB.LC_CONT FOR ACCESS LOCKING PDB.CDIV FOR ACCESS SEL D.DIV_SEQ , D.GRO_NO , D.CAT_NO , D.VBS_NO , S.DIV_NO , S.LN_NO , C.MTH_NO , COUNT(*) , SUM(H.REG_GRS_SLL_DLR) FROM PDB.CLS_LOC_HST H , PDB.SUBLN S , PDB.CSTORES C , PDB.LC_CONT L , PDB.CDIV D WHERE D.CAT_NO <> 'Y' AND L.CLU_NO = 57735 AND C.CYR_NO = 1999 AND C.YR_NO = 1999 AND H.SLS_CD in ('F','L','R','S','T') AND S.DIV_NO = D.DIV_NO AND C.LOC = L.LOC AND H.DIV_NO = S.DIV_NO AND H.LN_NO = S.LN_NO AND H.SBL_NO = S.SBL_NO AND H.MTH_NO = C.DB_MTH_NO AND H.LOC = C.LOC GROUP BY D.DIV_SEQ , D.GRO_NO , D.CAT_NO , D.VBS_NO , S.DIV_NO , S.LN_NO , C.MTH_NO ORDER BY D.DIV_SEQ , D.GRO_NO , D.CAT_NO , D.VBS_NO , S.DIV_NO , S.LN_NO , C.MTH_NO; *** Help information returned. 78 rows. *** Total elapsed time was 1 second.
Seeing the differences was pretty confusing. With a lot of help from Anita Richards, here's some useful info:
Some text from DR41584:
For join and retrieve steps, the confidence in the estimate and an estimate range (instead of only a single estimate) may be included in the explain. Note that this additional information increases the size of explains somewhat.
with high confidence
For base relations, a level of confidence is assigned as follows:
* If there exist applicable statistics on indexes/columns with restricting conditions, the base relation will have an estimate with high confidence.
* If there are no statistics but the optimizer can make an estimate based on random sampling on an index or statistics can be used but the restricting condition is an ANDed condition on non-indexed columns or an ORed condition, the base relation will have an estimate with low confidence.
* Otherwise, the base relation will have an estimate with no confidence.
A join will have a confidence level that is the lowest level confidence from its input relations, or no confidence if statistics do not exist in either join field. Alternatively, a join may be assigned index join confidence if there is join connection via a unique index.
For a retrieve from a spool, the confidence is the same as the step generating the spool (in cases where the cost is not reported, the confidence will also not be reported but can easily be derived from step that generated the spool).
Low or no confidence may indicate a higher degree of variance in the actual execution time and actual rows compared to the estimates. Collecting statistics on appropriate indexes and columns may improve the confidence, improve the estimates, reduce variances in the estimate from actuals, or generate an improved plan. See also the information for estimate phrases.
The s after row is omitted if the preceding count is 1.
Time is expressed as "x hours", "x hours and y minutes", "y minutes", "y minutes and z seconds", or "z.zz seconds" as appropriate. If only seconds are reported for both ltime and htime, for ltime, "seconds" is omitted. If the hours, minutes, or seconds are 1, the singular is used for hour, minute, or second, respectively).
If a row or time estimate is too large to display, three asterisks (***) are displayed instead.
The high end of a range (and preceding " to ") is not included in the explain if it is the same as the low end of the range, if it is not available, or if low end overflows (that is, if asterisks are printed for the low end).
The ltime (or low end) cost estimate is used in choosing the lowest cost plan and is calculated based on the l (or low end) row estimate. The low end row estimate is calculated as in previous releases. The low end estimates (as compared to midpoint or high end estimates) are used in choosing the plan in order to produce a reasonable plan that is most aggressive.
The htime (or high end) cost estimate has no affect on planning decisions and is calculated based on the h (or high end) row estimate. The high end row estimate gives an indication of a reasonable upper bound on the number of rows (cardinality) when more conservative assumptions are used compared to the low end estimate.
A low end row estimate, based on any statistics collected for a table, is used to estimate the size of the spool file needed to accommodate spooled data. If statistics have not been collected for a table, this estimate may be grossly incorrect (see "COLLECT STATISTICS" statement).
An estimated time is an approximation based on average times for the suboperations that comprise the overall operation, and the estimated number of rows involved in the operation. The accuracy of a time estimate is affected by the accuracy of the corresponding row estimate. The time shown is not clock time; you should consider it to be an arbitrary unit of measure that compares different operations.
The high end row estimate is propagated from one step to the next.
For a base relations, the high end row estimate is equal to the low end row estimate.
For a join, the high end row estimate is calculated as follows:
* If statistics are collected on the join fields or random sampling on indexes can be used, the high end row estimate will be calculated using the statistics or random sampling information and the high end row estimates of the input relations.
* Otherwise, the high end row estimate of a join result will be the product of the high end row estimates of the input relations.
The high end row estimate of a join relation may be the same as the low end row estimate when the join result total rows is estimated with low or high confidence.
A large variance between the low and high end estimates may indicate that the query and plan should be more closely examined, statistics need to be collected, the runtime may be significantly different than the low end estimate, or the execution of the query should be closely monitored. Usually, a large difference between the low and high end estimates means that statistics are missing; the user should try to correct it before running the query.
When the low or high end cost appears to be excessive, this may indicate the possibility of a run away plan. The user may then examine the query to see if any thing can be improved (check for an incorrectly, poorly expressed query, or missing statistics) or choose to run it only when there is enough resources. The user may want to closely monitor the execution of the query to see if it is executing closer to the low end cost estimate or actually runs much longer.
For a query, a significant difference in the high end estimates of one plan compared to a previous plan may indicate closer comparison of the two plans is needed (did a merge join become a product join?). Even though the low end estimates may be similar (leading the optimizer to choose a different plan), a difference in the high end estimate may indicate the potential for a longer execution time. Again, this may indicate statistics are missing or out of date.
As is true for the low end estimate, the high end estimate does not include the affects of the actual concurrent workload. They both can be affected by statistics that are out of date and no longer represent the demographics of the data. That is, out of date statistics may lead to overly pessimistic or optimistic estimates (estimates are based on the assumption that the statistics are representative of the data demographics).
The estimates may not reflect the actual number of rows returned or the actual execution time. The actuals may be outside the range. These estimates should be used only as indications and in comparing execution plans in a relative sense.
Note a plan is chosen based on the low end cost estimate. If the actual number of rows is close to the low end row estimate, this should be a good plan. However, if the actual is closer to the high end row estimate, then it is possible another plan (with a higher low end cost estimate but a lower high end cost estimate) may be better. Statistics will help the optimizer make better estimates and choose better plans.
|Copyright 2016 - All Rights Reserved|
|Last Modified: 28 Jun 2020|