带有row_number()over()语法的HIVE问题

时间:2017-05-03 10:51:23

标签: oracle hadoop hive

我有HIVE表(详情如下):

hive> select * from abcd ;
OK
a   1   1
b   2   2
a   3   3
Time taken: 0.261 seconds, Fetched: 3 row(s)
hive> desc abcd;
OK
val001                  string                                      
val002                  int                                         
val003                  int                                         
Time taken: 0.084 seconds, Fetched: 3 row(s)

我正在编写以下查询但收到以下错误:

select max(rnk) rnk, max(val) val, sum(cnt) cnt from (select val, count(*) cnt, row_number() over (order by case val  when null then 0 else count(*) end desc, val) rnk from (select VAL001 val from abcd ) group by val) group by case when rnk <= 100 or val is null then rnk else 100 + 1 end;

FAILED: ParseException line 3:55 missing ) at 'by' near 'by'
line 3:58 missing EOF at 'val' near 'by'

我正在寻找上述查询的以下结果:

RNK VAL                CNT
--- ------------------------------ ---
1   a                    2
2   b                    1

我能够从具有类似表格的Oracle数据库中实现相同的目标。只有差异而不是按照我在Oracle DB中通过解码使用顺序的顺序,但由于HIVe不支持解码,我不能这样做。

请找到正在运行的ORacle DB SQL查询:

    SQL> select max(rnk) rnk, max(val) val, sum(cnt) cnt from 
    (select val, count(*) cnt, row_number() over (order by 
    decode(val,null,0,count(*)) desc, val) rnk from (select VAL001 val from 
    table_name ) group by val)
    group by case when rnk <= 100 or val is null then rnk else 100 + 1 end;   

RNK VAL                CNT
--- ------------------------------ ---
 1 a                     2
 2 b                     1

任何人都可以帮我修复HIVE查询。如果您需要更多详细信息,请与我们联系。

2 个答案:

答案 0 :(得分:1)

这是您的查询。我怀疑有一种更简单的方法可以得到你想要的东西:

select max(rnk) as rnk, max(val) as val, sum(cnt) as cnt
from (select val, count(*) as cnt,
             row_number() over (order by case val when null then 0 else count(*) end desc, val) as rnk
      from (select VAL001 val from abcd )
      group by val
     )
group by case when rnk <= 100 or val is null then rnk else 100 + 1 end;

我认为你只需要from子句中子查询的表别名:

select max(rnk) as rnk, max(val) as val, sum(cnt) as cnt
from (select val, count(*) as cnt,
             row_number() over (order by case val when null then 0 else count(*) end desc, val) as rnk
      from (select VAL001 val from abcd
           ) x
      group by val
     ) x
group by case when rnk <= 100 or val is null then rnk else 100 + 1 end;

答案 1 :(得分:0)

这在技术上并不简单,但可能更容易阅读:

第一个子查询执行计数和排名,

第二个子查询top 1 - top 100中的分类以及other (top)unknown的特殊类别。

最终查询进行分组。

with cnt as (
 select VAL001 val, 
  count(*)  as cnt, 
  row_number() over (order by decode(VAL001,null,0,count(*)) desc, VAL001) as rnk
 from  abcd
 group by VAL001),
ctg as (
 select 
  val, cnt, rnk,
  case when val is NULL then 'unknown'
       when rnk <= 100 then 'top '||rnk
       else 'other' end as category_code
 from cnt)
select 
  max(rnk) as rnk, max(val) as val, sum(cnt) as cnt
from  ctg
group by category_code
order by 1