Archives of the TeradataForum

Message Posted: Fri, 11 Feb 2005 @ 21:20:29 GMT


	<Prev	Next>		<<First	<Prev	Next>	Last>>

Subj:		Re: Guidelines for using Distinct vs. group by

From:		Dieter Noeth

David A Hough wrote:

* Does the SQL standard call for a DISTINCT in the subquery, or is it left to the implementers?

SQL Standard usually tells about expected results, but not how to achieve it.

* Is the DISTINCT versus GROUP BY implementation difference on the Teradata unique to this platform (perhaps because the data is partitioned on many nodes)?

AFAIK most DBMSes show the same plan for both versions...

* Does anyone have benchmark results for DISTINCT versus GROUP BY performance on the Teradata for different result set percentages?

Table populated with random values (10,000,000 rows, 4 AMPs),
CPU Time from DBQL:

     Distinct     CPU Time    CPU Time
        Values     Distinct    Group By
       6320690        94,59      207,59
        999949        90,61      133,17
        100000        81,83       66,56
         10000        77,47       16,00
          1000        74,08       13,47
           100        69,30       12,81
            10        52,50       12,44
             1        45,94       11,45

         10000        38,92       11,52

The last result is for NUPI-columns

I would be glad to see results for other (large) systems :-)

     create multiset table distinct_test
     (x1 int,
       x10 int,
       x100 int,
       x1000 int,
       x10000 int,
       x100000 int,
       x100000nupi int,
       x1000000 int,
       x10000000 int
     ) primary index (x100000nupi)
     ;

     insert into distinct_test
     select
        1,
        r / 100000,
        r / 10000,
        r / 1000,
        r / 100,
        r / 10,
        random(0,1000000) as r,
        r,
        random(-2000000000,2000000000)
     from
       (select 1 as x from sys_calendar.calendar
         sample with replacement 10000000) dt;

     sel count(*) from (sel distinct x10000000 from distinct_test) dt;
     sel count(*) from (sel x10000000 from distinct_test group by 1) dt;

     sel count(*) from (sel distinct x1000000 from distinct_test) dt;
     sel count(*) from (sel x1000000 from distinct_test group by 1) dt;

     sel count(*) from (sel distinct x100000 from distinct_test) dt;
     sel count(*) from (sel x100000 from distinct_test group by 1) dt;

     sel count(*) from (sel distinct x10000 from distinct_test) dt;
     sel count(*) from (sel x10000 from distinct_test group by 1) dt;

     sel count(*) from (sel distinct x1000 from distinct_test) dt;
     sel count(*) from (sel x1000 from distinct_test group by 1) dt;

     sel count(*) from (sel distinct x100 from distinct_test) dt;
     sel count(*) from (sel x100 from distinct_test group by 1) dt;

     sel count(*) from (sel distinct x10 from distinct_test) dt;
     sel count(*) from (sel x10 from distinct_test group by 1) dt;

     sel count(*) from (sel distinct x1 from distinct_test) dt;
     sel count(*) from (sel x1 from distinct_test group by 1) dt;

     sel count(*) from (sel distinct x100000nupi from distinct_test) dt;
     sel count(*) from (sel x100000nupi from distinct_test group by 1) dt;

Dieter


	<Prev	Next>		<<First	<Prev	Next>	Last>>

Attachments

Library

Quick Reference