Archives of the TeradataForum

Message Posted: Fri, 06 Aug 2004 @ 12:51:59 GMT


	<Prev	Next>		<<First	<Prev	Next>	Last>>

Subj:		Re: Redistribution Vs Duplication

From:		Coffing Christopher L

Prasanna

Make sure that you have run Collect Statistics on the tables. This gives the parsing engine better information to build the plan. You may also wish to evaluate secondary indexes. When joining on the PI you may find that there are additional benefits using a secondary if you have any residual criteria.

The objective is to redistribute the smaller table of course.

Another thought is the Primary Index selection. Oftentimes our ETL teams will define the PI on the tables so they have a UPI for even data distribution which is great until a user asks a question. Just because you have even data distribution for loads does not mean you have good performance for user queries. You can use the following SQL to determine how the indexes are being distributed and also determine if the data is on the same amp. You can only join data that is on the same AMP so if you are seeing a lot of redistribution, then the data is being moved to the appropriate AMP for joining.

     select column identifier (is use the PI),
     hashamp(hashbucket(hashrow(PI)))
            (integer) (named vproc), count(*)
     from databasename.tablename
     group by 1
     order by 1
     sample 20

The above SQL lets me know where my data resides and on what AMP. I would run this against both of your tables to determine if the PI's reside on the same AMP.

If the PI's are on the same AMP you would get a nice MERGE JOIN.

Hope this helps...

Best Regards

Chris Coffing
AFKS O&M Lead Teradata DBA
Certified Teradata Master


	<Prev	Next>		<<First	<Prev	Next>	Last>>

Attachments

Library

Quick Reference