Archives of the TeradataForum
Message Posted: Tue, 22 Apr 2003 @ 16:11:07 GMT
Hi again Everyone:
Based on Dieter's need to clarify my reply, obviously, there is still confusion that exists on the internal functioning of both RANK and SAMPLE.
I have been told that the RANK was changed in V2R4 to only pull a specific range of rows on each AMP when the QUALIFY is used to limit the number of rows returned.
I was also told that this change of internal operation had not been implemented for the SAMPLE function in V2R4. Therefore, it is still slow. Maybe Todd will help to clear up the confusion regarding internal operations and therefore, the speed of each alternative.
Most of us are very aware of the hashing operation and its incorporation into the distribution, the sequencing and its importance regarding the PI for Teradata. The hashing makes the sequencing of rows in a table appear random. This is a given situation and very obvious to even the casual observer.
That is specifically the reason I mentioned the RANK operation with ASC in order to get the "first" 2000 rows as outlined in the original posting because it does a sort. Even with an ORDER BY, the SAMPLE presents a VERY low probability of receiving a predictable return, meaning the "first 2000 rows," as it should.
Beyond this point, I leave it up to the readers to draw their own conclusions on the best solution to their particular situation. Functionality is an important part of the equation, but surely, performance ranks (get it?) right up there too.
|Copyright 2016 - All Rights Reserved|
|Last Modified: 27 Dec 2016|