Union Conundrum [From Jeremy Christiansen: Thu, 04 Jan 2001 @ 20:26 GMT]

Message Posted: Thu, 04 Jan 2001 @ 20:26:57 GMT

Subj:

Union Conundrum

From:

Jeremy Christiansen

I am running a "union all" query that merges the result sets of two select statements. The query fails on insufficient spool. I am surprised by this because the total amount of data that should be returned by the query is about half of my total spool. My total spool is 50 gig and the query should return about 27 gig. Now the first possible cause of this problem that comes to mind is a "hot amp". I checked into this and it is not the case. I queried the peakspool in dbc.diskspace and the spool usage is more or less even. Furthermore, while the query was running I checked the currentspool. Before the query failed, I caught it using 43 gigabytes. (Note: this is the only query running under the userid I am logged on as.) I don't understand how it could be using 43 gigabytes. There's more, though. I ran each of the select statements individually to see what would happen . Executing alone, each select statement ran fine and used the expected amount of diskspace. In particular, the first select produced an answer set of 2.3 gig and the second 25 gig. I have included a copy of the explain below. Essentially the query is very simple. The first step is to join two tables, by way of the same primary index, and then build the result locally. The second step is to do a retrieve step from a table and then build that locally. That's it! I doubt that anyone can help with this, given the limited amount of information I've provided. Nonetheless, I'm writing this in the hope that some Teradata genius will point out an arcane feature of Teradata's internal machinery that might explain why this problem is occurring.

Explanation --------------------------------------------------

1)

First, we lock a distinct SURSDB."pseudo table" for read on a RowHash to prevent global deadlock for SURSDB.Line.

2)

Next, we lock a distinct SURSDB."pseudo table" for read on a RowHash to prevent global deadlock for SURSDB.Base.

3)

We lock SURSDB.Line for read, and we lock SURSDB.Base for read.

4)

We do an all-AMPs JOIN step from SURSDB.Base by way of a RowHash match scan with no residual conditions, which is joined to SURSDB.Line. SURSDB.Base and SURSDB.Line are joined using a merge join, with a join condition of ("(SURSDB.Base.claimreferencenumber = SURSDB.Line.claimreferencenumber) AND ((SURSDB.Base.claimlinenumber = SURSDB.Line.claimlinenumber) AND (SURSDB.Base.adjustmentcode = SURSDB.Line.adjustmentcode ))"). The input table SURSDB.Base will not be cached in memory, but it is eligible for synchronized scanning. The result goes into Spool 1, which is built locally on the AMPs. The size of Spool 1 is estimated to be 5,546,400 rows. The estimated time for this step is 12 minutes and 40 seconds.

5)

We do an all-AMPs RETRIEVE step from SURSDB.Base by way of an all-rows scan with no residual conditions into Spool 1, which is built locally on the AMPs. The input table will not be cached in memory, but it is eligible for synchronized scanning. The result spool file will not be cached in memory. The size of Spool 1 is estimated to be 51,657,984 rows. The estimated time for this step is 30 minutes and 55 seconds.

6)

Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request.

->

The contents of Spool 1 are sent back to the user as the result of statement 1. The total estimated time is 0 hours and 43 minutes and 34 seconds.

Attachments

Library

Quick Reference

Archives of the TeradataForum

Message Posted: Thu, 04 Jan 2001 @ 20:26:57 GMT