|
|
Archives of the TeradataForum
Message Posted: Thu, 09 Aug 2007 @ 20:30:07 GMT
Subj: | | Re: High skew while doing rank |
|
From: | | Dieter Noeth |
All rows with the same combination of values are sent to the same AMP and there are probably lots of rows with a low activity_count :-(
One possible way to rewrite it (untested, but syntax should be correct):
SELECT t.activity_prd_id ,
t.site_id,
t.user_id,
t.ACTIVITY_COUNT,
dt.activity_count_rank
FROM tb_test AS t JOIN
(
SELECT activity_prd_id ,
site_id,
ACTIVITY_COUNT,
COALESCE(SUM(cnt) OVER (PARTITION BY activity_prd_id , site_id
ORDER BY ACTIVITY_COUNT DESC
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING),0)
+ 1 AS activity_count_rank
FROM
(
SELECT activity_prd_id,
site_id,
ACTIVITY_COUNT,
COUNT(*) AS cnt
FROM tb_test
GROUP BY 1,2,3
) dt
) dt
ON t.activity_prd_id = dt.activity_prd_id
AND t.site_id = dt.site_id
AND t.ACTIVITY_COUNT = dt.ACTIVITY_COUNT
If you need a DENSE_RANK instead of a RANK:
SELECT t.activity_prd_id ,
t.site_id,
t.user_id,
t.ACTIVITY_COUNT,
dt.activity_count_rank
FROM tb_test AS t JOIN
(
SELECT activity_prd_id,
site_id,
ACTIVITY_COUNT,
RANK() OVER (PARTITION BY activity_prd_id , site_id
ORDER BY ACTIVITY_COUNT DESC ) activity_count_rank
FROM tb_test
GROUP BY 1,2,3
) AS dt
ON t.activity_prd_id = dt.activity_prd_id
AND t.site_id = dt.site_id
AND t.ACTIVITY_COUNT = dt.ACTIVITY_COUNT
I don't know if those queries will be less skewed: The Derived Table will be small, but the optimizer might choose to redistribute the
large table for the join...
Dieter
| |