我有HIVE表(详情如下):
hive> select * from abcd ;
OK
a 1 1
b 2 2
a 3 3
Time taken: 0.261 seconds, Fetched: 3 row(s)
hive> desc abcd;
OK
val001 string
val002 int
val003 int
Time taken: 0.084 seconds, Fetched: 3 row(s)
我正在编写以下查询但收到以下错误:
select max(rnk) rnk, max(val) val, sum(cnt) cnt from (select val, count(*) cnt, row_number() over (order by case val when null then 0 else count(*) end desc, val) rnk from (select VAL001 val from abcd ) group by val) group by case when rnk <= 100 or val is null then rnk else 100 + 1 end;
FAILED: ParseException line 3:55 missing ) at 'by' near 'by'
line 3:58 missing EOF at 'val' near 'by'
我正在寻找上述查询的以下结果:
RNK VAL CNT
--- ------------------------------ ---
1 a 2
2 b 1
我能够从具有类似表格的Oracle数据库中实现相同的目标。只有差异而不是按照我在Oracle DB中通过解码使用顺序的顺序,但由于HIVe不支持解码,我不能这样做。
请找到正在运行的ORacle DB SQL查询:
SQL> select max(rnk) rnk, max(val) val, sum(cnt) cnt from
(select val, count(*) cnt, row_number() over (order by
decode(val,null,0,count(*)) desc, val) rnk from (select VAL001 val from
table_name ) group by val)
group by case when rnk <= 100 or val is null then rnk else 100 + 1 end;
RNK VAL CNT
--- ------------------------------ ---
1 a 2
2 b 1
任何人都可以帮我修复HIVE查询。如果您需要更多详细信息,请与我们联系。
答案 0 :(得分:1)
这是您的查询。我怀疑有一种更简单的方法可以得到你想要的东西:
select max(rnk) as rnk, max(val) as val, sum(cnt) as cnt
from (select val, count(*) as cnt,
row_number() over (order by case val when null then 0 else count(*) end desc, val) as rnk
from (select VAL001 val from abcd )
group by val
)
group by case when rnk <= 100 or val is null then rnk else 100 + 1 end;
我认为你只需要from
子句中子查询的表别名:
select max(rnk) as rnk, max(val) as val, sum(cnt) as cnt
from (select val, count(*) as cnt,
row_number() over (order by case val when null then 0 else count(*) end desc, val) as rnk
from (select VAL001 val from abcd
) x
group by val
) x
group by case when rnk <= 100 or val is null then rnk else 100 + 1 end;
答案 1 :(得分:0)
这在技术上并不简单,但可能更容易阅读:
第一个子查询执行计数和排名,
第二个子查询top 1 - top 100
中的分类以及other (top)
和unknown
的特殊类别。
最终查询进行分组。
with cnt as (
select VAL001 val,
count(*) as cnt,
row_number() over (order by decode(VAL001,null,0,count(*)) desc, VAL001) as rnk
from abcd
group by VAL001),
ctg as (
select
val, cnt, rnk,
case when val is NULL then 'unknown'
when rnk <= 100 then 'top '||rnk
else 'other' end as category_code
from cnt)
select
max(rnk) as rnk, max(val) as val, sum(cnt) as cnt
from ctg
group by category_code
order by 1