Home Page for the TeradataForum

Archives of the TeradataForum

Message Posted: Tue, 10 Jul 2001 @ 15:54:55 GMT

  <Prev Next>  
Next> Last>>  

Subj:   Optimizing query path between large and small tables
From:   Anomy Anom

<-- Anonymously Posted: Tuesday, July 10, 2001 11:25 -- >

Hi all,

I have a job that contains within it's structure a large fact table of over 500 million and 2 small ref. tables under 200 rows each.

The issue I have is to how the optimizer decides to do the query depending upon what indexes are set on the small tables (choices I have come up with so far do not seem to give an ideal solution)

The fact table is joined to both of the smaller tables by what would be one half of a UPI on the 2 ref. tables (This column is just a standard column on the fact table which has a NUPI on a different field - customer number).

When the 2 ref. tables are set as NUPI's (collects stats on NUPI's) and therefore accessed down this path the optimizer decides to:

1. Product Join the 2 ref tables - Fine

2. Take the fact table (and filter conditions) and redistribute by hash code??

3. Join the 2 sets of spool from the above steps by the NUPI on the ref. tables.

This is a problem because the redistribution heavily skews the job at this point due to the popularity of some values from the ref. tables(up to 99%).

When the 2 ref. tables are set as UPI's (collect stats on UPI's) and therefore the PI is not used as the access path, the optimizer does as follows

1. Retrieves one of the ref. tables and duplicates across amps - ok

2. Retrieves second ref. table and duplicated across amps - ok

3. Takes the spool from the first ref. table and decides to do a product join to the fact table that includes the access path (without using index this time) and any conditions and the spool is built locally -??

4. Takes the spool from steps 2 and 3 and performs the same join path as done in step 3 but this time does it as a merge join.

This query does not skew but is very resource intensive (possibly down to a product join of 500 million x 200?) Is there a better way of designing the query/tables or is this the norm for Large Table/Small Table joins and each method has it's trade-offs.

At the end of the day both methods work but I was wondering if there is preferred method or any other bits that can be added to perform better than the 2 examples above when dealing with such joins.

Thanks in advance.

  <Prev Next>  
Next> Last>>  
  Top Home Privacy Feedback  
Copyright for the TeradataForum (TDATA-L), Manta BlueSky    
Copyright 2016 - All Rights Reserved    
Last Modified: 28 Jun 2020