Archives of the TeradataForum

Message Posted: Tue, 11 Mar 2008 @ 22:28:15 GMT


	<Prev	Next>		<<First	<Prev	Next>	Last>>

Subj:		Re: Spool space problem & questions

From:		Dieter Noeth

Michael Larkins wrote:

Even so, the EXPLAIN shows that TOP is implemented as a STAT function. I have heard that RANK (which requires an ORDER BY) when done with a QUALIFY will only place the requested number of rows on each AMP (i.e. 5 highest or lowest) locally and then globally return the requested values/rows (highest of the high or lowest of the low).

That's why RANK is quite fast even on a large number of rows, if the number of qualifying rows is small.

ROW_NUMBER could use the same optimization, but apparently it's implemented using a different (i.e. slower) algorithm, probably spooling all rows before QUALIFY.

It would be my hope that TOP might be implemented using the same logic as RANK/QUALIFY - but only the developers can tell us that for sure since the details of a STAT FUNCTION are not present in the EXPLAIN.

TOP without any ORDER will use "single AMP optimization" for up to 5000 rows (if the table is large), and "load distribution optimization" (i.e. several AMPS) for values > 5000 or small tables.

This is just reading the first datablock(s).

If it's still not retrieving the requested number of rows, then there's "execute step x", which seems to be a kind of "fallback" to a SAMPLE.

     "TOP 10 WITH TIES ... ORDER BY yy" is rewritten as
     "QUALIFY RANK() OVER (ORDER BY yy) <= 10"

     "TOP 10 ... ORDER BY yy"
     -> "QUALIFY ROW_NUMBER() OVER (ORDER BY yy) <= 10"

     "TOP 10 PERCENT WITH TIES... ORDER BY yy"
     -> "QUALIFY PERCENT_RANK() OVER (ORDER BY yy) <= 0.1"

     "TOP 10 PERCENT ... ORDER BY yy"
     -> there's no shortcut, but it's equivalent to the PERCENT_RANK,
        just using ROW_NUMBER instead of RANK.

Either way, as you indicated, it will use more spool than a simple select, but hopefully not all rows as you seem to imply.

Only Explain can show this: if it's on a base table without any join/where/group it's just accessing the first datablocks and no spool is prepared, anything else creates the whole answer set before the STAT function.

This behaviour is similar to a SAMPLE.

Dieter


	<Prev	Next>		<<First	<Prev	Next>	Last>>

Attachments

Library

Quick Reference