Archives of the TeradataForum

Message Posted: Wed, 10 Jan 2001 @ 20:41:09 GMT


	<Prev	Next>		<<First	<Prev	Next>	Last>>

Subj:		Re: Limited Select

From:		Frank Martinez

Hmm, I didn't think the SAMPLE did the same amount of work. The sixth volume of the SQL manual (there are six volumes now? Argh!) has the following statement in it on pg 1-60:

---------------------------------------- Example 2: Using count_description

Suppose you want to see if your customers are in at least 100 cities. The SELECT statement to do that is the following:

     SELECT COUNT (DISTINCT city)
     FROM (SELECT city FROM customer_table
     SAMPLE 1000) TEMP
     ;

If customer_table is large, the SAMPLE 1000 clause would not require a full scan of the table and the sort for DISTINCT would only handle 1000 rows.

-----------------------------------------

I also did two separate EXPLAINS and got the following differing results:

     EXPLAIN
     SELECT OrgCode, AgencyCode
       FROM Budget;

1) First, we lock a distinct HCFTEST."pseudo table" for read on a RowHash to prevent global deadlock for HCFTEST.Budget. 2) Next, we lock HCFTEST.Budget for read. 3) We do an all-AMPs RETRIEVE step from HCFTEST.Budget by way of an all-rows scan with no residual conditions into Spool 1, which is built locally on the AMPs. The size of Spool 1 is estimated with low confidence to be 32,358 rows. The estimated time for this step is 1.19 seconds. 4) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request.

`Explanation --------------------------------------------------`
	`->`	`The contents of Spool 1 are sent back to the user as the result of statement 1. The total estimated time is 1.19 seconds.`

     EXPLAIN
     SELECT OrgCode, AgencyCode
       FROM Budget
     SAMPLE 10;

1) First, we lock a distinct HCFTEST."pseudo table" for read on a RowHash to prevent global deadlock for HCFTEST.Budget. 2) Next, we lock HCFTEST.Budget for read. 3) We do an all-AMPs SAMPLING step from HCFTEST.Budget by way of an all-rows scan with no residual conditions into Spool 1, which is built locally on the AMPs. Samples are specified as number of rows.. 4) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request.

`Explanation --------------------------------------------------`
	`->`	`The contents of Spool 1 are sent back to the user as the result of statement 1.`

Seems to be doing something different, although I hate it when the optimizer doesn't return a time estimate. Somebody want to tell me why it doesn't all the time? Did we give up doing that all the time, or is it just a feature that introduces a little randomness in our lives? I taught the advanced SQL class a year ago in Mexico for a client, and at that time there was nothing on what SAMPLE did specifically, but I would think that the PE would tell each amp, hey, send back X records or X percentage of the records (with only one sample that is) that you have. Somebody want to comment?

Frank C. Martinez IV


	<Prev	Next>		<<First	<Prev	Next>	Last>>

Attachments

Library

Quick Reference