Archives of the TeradataForum

Message Posted: Tue, 22 Apr 2003 @ 08:00:12 GMT


	<Prev	Next>		<<First	<Prev	Next>	Last>>

Subj:		Re: Select * from mytable

From:		Dieter N�th

Michael Larkins wrote:

The SAMPLE suggestion would absolutely return only 2000 rows. However, as mentioned, they are randomly selected. Not only that, but historically this function has been very slow, requiring all rows to be in SPOOL for the random selection to be implemented.

SAMPLE is much faster since V2R4.

SAMPLE works without spooling all rows.

The original posting indicated, "The astute user wants to look at the first 2000 rows..."

In other RDBMS there are clustered value-ordered indexes, so a SELECT without ORDER will usually return the rows in index order. But in Teradata the PI is hashed, so it will return the data in hash order and this looks totally unordered. The OP looks like he's satisfied with the unordered result set.

So, I might suggest the RANK function. It can be set up with the QUALIFY to return the first 2000 rows in the sequence of the value(s) by the "Primary Key" and/or Primary Index in Teradata.

A RANK will spool all rows and then create the rank. Ranking the PI is not faster, because it's hash-ordered. Even ranking the ROWID will spool all rows. So unless you really need ordered rows, SAMPLE is much more efficient than RANK/CSUM.

Locking MyTable for access
Select * from MyTable
QUALIFY RANK(PI_column_name ASC) <= 2000 ;
/* remember that RANK defaults to DESC */

A "SAMPLE 2000" will need about 2000 disk I/Os because it's returning random rows from probably different datablocks, that's why it seems to be slow. But it's fast compared to RANK.

Dieter


	<Prev	Next>		<<First	<Prev	Next>	Last>>

Attachments

Library

Quick Reference