Home Page for the TeradataForum
 

Archives of the TeradataForum

Message Posted: Thu, 09 Aug 2007 @ 20:30:07 GMT


     
  <Prev Next>   <<First <Prev
Next>
Last>>
 


Subj:   Re: High skew while doing rank
 
From:   Dieter Noeth

All rows with the same combination of values are sent to the same AMP and there are probably lots of rows with a low activity_count :-(

One possible way to rewrite it (untested, but syntax should be correct):

     SELECT t.activity_prd_id ,
        t.site_id,
        t.user_id,
        t.ACTIVITY_COUNT,
        dt.activity_count_rank
     FROM tb_test AS t JOIN
       (
        SELECT activity_prd_id ,
          site_id,
          ACTIVITY_COUNT,
          COALESCE(SUM(cnt) OVER (PARTITION BY activity_prd_id , site_id
                         ORDER BY ACTIVITY_COUNT DESC
                         ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING),0)
          + 1 AS activity_count_rank
        FROM
         (
          SELECT activity_prd_id,
            site_id,
            ACTIVITY_COUNT,
            COUNT(*) AS cnt
          FROM tb_test
          GROUP BY 1,2,3
         ) dt
       ) dt
     ON t.activity_prd_id = dt.activity_prd_id
     AND t.site_id = dt.site_id
     AND t.ACTIVITY_COUNT = dt.ACTIVITY_COUNT

If you need a DENSE_RANK instead of a RANK:

     SELECT t.activity_prd_id ,
        t.site_id,
        t.user_id,
        t.ACTIVITY_COUNT,
        dt.activity_count_rank
     FROM tb_test AS t JOIN
       (
        SELECT activity_prd_id,
          site_id,
          ACTIVITY_COUNT,
          RANK() OVER (PARTITION BY  activity_prd_id , site_id
                       ORDER BY ACTIVITY_COUNT DESC ) activity_count_rank
        FROM tb_test
        GROUP BY 1,2,3
       ) AS dt
     ON t.activity_prd_id = dt.activity_prd_id
     AND t.site_id = dt.site_id
     AND t.ACTIVITY_COUNT = dt.ACTIVITY_COUNT

I don't know if those queries will be less skewed: The Derived Table will be small, but the optimizer might choose to redistribute the large table for the join...


Dieter



     
  <Prev Next>   <<First <Prev
Next>
Last>>
 
 
 
 
 
 
 
 
 
  
  Top Home Privacy Feedback  
 
 
Copyright for the TeradataForum (TDATA-L), Manta BlueSky    
Copyright 2016 - All Rights Reserved    
Last Modified: 15 Jun 2023