Archives of the TeradataForum
Message Posted: Tue, 22 Apr 2003 @ 08:00:12 GMT
| Subj: || || Re: Select * from mytable |
| From: || || Dieter Nöth |
Michael Larkins wrote:
| ||The SAMPLE suggestion would absolutely return only 2000 rows. However, as mentioned, they are randomly selected. Not only that, but
historically this function has been very slow, requiring all rows to be in SPOOL for the random selection to be implemented.|| |
SAMPLE is much faster since V2R4.
SAMPLE works without spooling all rows.
| ||The original posting indicated, "The astute user wants to look at the first 2000 rows..."|| |
In other RDBMS there are clustered value-ordered indexes, so a SELECT without ORDER will usually return the rows in index order. But in
Teradata the PI is hashed, so it will return the data in hash order and this looks totally unordered. The OP looks like he's satisfied with
the unordered result set.
| ||So, I might suggest the RANK function. It can be set up with the QUALIFY to return the first 2000 rows in the sequence of the
value(s) by the "Primary Key" and/or Primary Index in Teradata.|| |
A RANK will spool all rows and then create the rank. Ranking the PI is not faster, because it's hash-ordered. Even ranking the ROWID
will spool all rows. So unless you really need ordered rows, SAMPLE is much more efficient than RANK/CSUM.
| ||Locking MyTable for access|
Select * from MyTable
QUALIFY RANK(PI_column_name ASC) <= 2000 ;
/* remember that RANK defaults to DESC */
A "SAMPLE 2000" will need about 2000 disk I/Os because it's returning random rows from probably different datablocks, that's why it seems
to be slow. But it's fast compared to RANK.