Home Page for the TeradataForum
 

Archives of the TeradataForum

Message Posted: Fri, 06 Aug 2004 @ 12:51:59 GMT


     
  <Prev Next>   <<First <Prev Next> Last>>  


Subj:   Re: Redistribution Vs Duplication
 
From:   Coffing Christopher L

Prasanna

Make sure that you have run Collect Statistics on the tables. This gives the parsing engine better information to build the plan. You may also wish to evaluate secondary indexes. When joining on the PI you may find that there are additional benefits using a secondary if you have any residual criteria.

The objective is to redistribute the smaller table of course.

Another thought is the Primary Index selection. Oftentimes our ETL teams will define the PI on the tables so they have a UPI for even data distribution which is great until a user asks a question. Just because you have even data distribution for loads does not mean you have good performance for user queries. You can use the following SQL to determine how the indexes are being distributed and also determine if the data is on the same amp. You can only join data that is on the same AMP so if you are seeing a lot of redistribution, then the data is being moved to the appropriate AMP for joining.

     select column identifier (is use the PI),
     hashamp(hashbucket(hashrow(PI)))
            (integer) (named vproc), count(*)
     from databasename.tablename
     group by 1
     order by 1
     sample 20

The above SQL lets me know where my data resides and on what AMP. I would run this against both of your tables to determine if the PI's reside on the same AMP.

If the PI's are on the same AMP you would get a nice MERGE JOIN.

Hope this helps...


Best Regards

Chris Coffing
AFKS O&M Lead Teradata DBA
Certified Teradata Master



     
  <Prev Next>   <<First <Prev Next> Last>>  
 
 
 
 
 
 
 
 
  
  Top Home Privacy Feedback  
 
 
Copyright for the TeradataForum (TDATA-L), Manta BlueSky    
Copyright 2016 - All Rights Reserved    
Last Modified: 15 Jun 2023