在HIVE中使用亚组内的排名

时间:2017-04-25 21:01:42

标签: sql hadoop hive

嘿那里我试图让这个查询使用fields但是没有运气,

rank()

我一直在,

  

失败:SemanticException [错误10004]:第4行:6无效的表别名或列引用'rank'

我的目标是根据select t.orig, t.id, count(*) as num_actions, rank() over (partition by t.orig order by count(*) desc) as rank from sample_table t where rank < 21 and t.month in (201607,20608,201609,201610,201611,201612) and t.orig in (select tw.pageid from tw_sample as tw limit 50) group by t.orig, t.id 参数获取每个t.orig的前20行。

如果你还可以解释我哪里出错了所以我可以从中学习,那将非常感激。

1 个答案:

答案 0 :(得分:1)

您不能在where子句中使用别名。使用子查询:

select *
from (select t.orig, t.id, count(*) as num_actions, 
             rank() over (partition by t.orig order by count(*) desc) as rnk
      from sample_table t
      where t.month in (201607, 20608, 201609, 201610, 201611, 201612)  and
            t.orig in (select tw.pageid from tw_sample tw limit 50) 
      group by t.orig, t.id
     ) t
where rank < 21