Archives of the TeradataForum

Message Posted: Wed, 05 Dec 2002 @ 00:59:23 GMT


	<Prev	Next>		<<First	<Prev	Next>	Last>>

Subj:		Re: Difference between Tera Data and SQL

From:		Hough, David A

Just to make sure we've covered everything...

You can produce distinct result sets in two ways:

SELECT DISTINCT x FROM TABLE;

-or-

SELECT x FROM TABLE GROUP BY x;

The Teradata uses two different methods to produce the result sets, however, and the performance varies dramatically depending on the data.

SELECT DISTINCT is designed for data sets that are nearly unique to begin with, and works by sorting the entire intermediate spool file and discarding duplicate values. SELECT/GROUP BY is designed for data sets that have relatively few unique values, and works by performing an amp local grouping operation & then merging the partial result sets for final processing.

I've never been able to measure the exact performance crossover point, but I use this estimate: if the results are 90% unique use DISTINCT, if the results are 10% unique use GROUP BY, if in between test both (or use GROUP BY in the absence of test time).

BTW, the difference between DISTINCT and GROUP BY is why you sometimes see people use GROUP BY in subqueries (which use DISTINCT by default) to improve performance (dramatically in some cases). This is hard to do with tool generated SQL, but it's still something to consider if you can tune the output from the tool.

/dave hough


	<Prev	Next>		<<First	<Prev	Next>	Last>>

Attachments

Library

Quick Reference