Archives of the TeradataForum
Message Posted: Wed, 23 Jan 2002 @ 13:14:06 GMT
Due to the Teradata's parallel processing design, I feel Teradata's biggest problem is working with data which is very poorly distributed.
For instance, we have a problem involving 2 tables with the same NUPI. As it turns out, these 2 tables have poorly distributed data. When I join these 2 tables together on their respective NUPIs (the Optimizer choosing a merge join), the poor data distribution is amplified and some AMPs work ~400% harder than other AMPs.
I found a way of running the same query in half the time (with all AMPs working at the same rate) by choosing a new PI for the large table (with better data distribution) and forcing the Optimizer to duplicate the smaller table to all AMPs (I did this by creating a dummy NUPI with a default value of 1). With these new PIs, the Optimizer performed a hash join instead of a merge join.
What the Optimizer needs is a better way of detecting and avoid executing queries which will cause the AMPs to work at very uneven levels. Once these bad queries are detected, the Optimizer needs a way of executing a more efficient plan. In the scenario I described above, I'd like to be able to specify an alternate PI for these 2 tables (via stored as a Join Index or resolved (rehashed) at run time via some new feature) which the Optimizer would utilize to execute a more efficient plan.
My 2 Cents,
|Copyright 2016 - All Rights Reserved|
|Last Modified: 28 Jun 2020|