Archives of the TeradataForum
Message Posted: Thu, 11 Jan 2001 @ 14:49:50 GMT
As it has been explained to me. The Sample function selects the sample rows from combination Master Index / Cylinder Index until it has met the number of rows required or the percentage requested and then takes these rows from the table. So the work to do a sample should be much less. In my short tests, the work to do a sample on a single table is much less than to count the rows of that table, and vastly less than to scan the rows of that table and then limit the rows using retlimit and retcancel.
Also do limit the rows in a join you should do the sample in a derived table and then join the results of the derived table to the join table. This should restrict the rows before joining.
By the way, the function uses a random number generator to figure out which rows that it takes from which amps and since the data is usually (possibly not in very non unique NUPI) spread randomly across the AMPs, the sample process yields very random results. In a test on a table with UPI, the randomness of the results met the strict requirements of several statisticians at the retailer where I am currently working.
|Copyright 2016 - All Rights Reserved|
|Last Modified: 28 Jun 2020|