蜂巢-蜂巢子查询问题

时间:2019-02-14 10:17:01

标签: sql database hive hiveql hue

我的问题陈述就像

“找到每个州人口最多的前2个地区”

数据就像

Input

我的预期输出是

output

我尝试了很多查询和子查询,但是子查询导致SQL错误

有人可以帮助我获得此结果吗?

谢谢。

我尝试过的查询

  1. 选择 state_name, (从人口中选择concat_ws(',',collect_set(dist_name作为字符串)),其中state_name = state_name按人口desc 2按状态顺序分组)

按州名分组的人群

  1. 选择
    state_name, concat_ws(',',collect_set(cast(dist_name as string)))
    从人口 在哪里人口.dist_name(从(选择dist_name 选择dist_name,max(b.population)作为总数 来自人群b 其中state_name = b.state_name 按b.dist_name分组,b.dist_name 按总desc限制订购2) 为dist_name) 按state_name分组

1 个答案:

答案 0 :(得分:0)

下面是查询-

 select A.state, collect_set(A.dist)[0], collect_set(A.dist)[1] from 
(select state, dist, row_number() over (partition by state order by population 
 desc) as rnk from <tableName>) A
where A.rnk<=2 group by A.state;

以下是示例数据的结果-

hive> select * from hier;
OK
C1      C11
C11     C12
C12     123
P1      C1
P2      C2

hive> select parent, collect_set(child)[0], collect_set(child)[1] from hier group by parent;
OK
C1      C11     NULL
C11     C12     NULL
C12     123     NULL
P1      C1      NULL
P2      C2      NULL
Time taken: 19.212 seconds, Fetched: 5 row(s)