HiveQL - 将多行数据聚合为单行

时间:2016-12-29 10:28:47

标签: hive hiveql

我努力将多行数据放入一个列中。这里举一个例子是我的表:

site        country    users    country _rank
cnn.com     840        10000    1
cnn.com     31         4000     3
cnn.com     556        6000     2
rt.com      840        200      3
rt.com      33         6000     2
rt.com      400        10000    1

我想要得到的结果是前2个国家/地区的用户数量并将其排成一行:

site       country_1    country_1_share     country_2     country_2_share
cnn.com    840          10000               556           6000
rt.com     400          10000               33            6000

我尝试过几种不同的方式:

select site, country_1, country_1_share,country_2,country_2_share
from (
  select site
  ,max(CASE WHEN country_rank = 1 THEN country END) AS country_1 
  ,max(CASE WHEN country_rank = 1 THEN users END) as country_1_share 
  ,max(CASE WHEN country_rank = 2 THEN country END) AS country_2
  ,max(CASE WHEN country_rank = 2 THEN users END) as country_2_share 
  from t1
  group by site
)

还有:

select a.site, a.country_1, b.country_1_share,c.country_2,d.country_2_share
from (
  select site, country as country_1
  from t1
  where max(CASE WHEN country_rank = 1 THEN country END)) a
 JOIN (
  select site, users as country_1_share
  from t1
  where max(CASE WHEN country_rank = 1 THEN users END)) b on (a.site=b.site)
 JOIN (
  select site, country as country_2
  from t1
  where max(CASE WHEN country_rank = 2 THEN country END)) c on (a.site = c.site)
 JOIN (
  select site, users as country_2_share
  from t1
  where max(CASE WHEN country_rank = 2 THEN users END)) d on (a.site = c.site)

非常感谢任何见解!

1 个答案:

答案 0 :(得分:1)

这适用于Hive 1.2.1:

drop table if exists t1;

create table t1 
as
select 'cnn.com' site,  840 country ,  10000 users,   1 country_rank union all
select 'cnn.com' site,  31  country ,  4000  users,   3 country_rank union all
select 'cnn.com' site,  556 country ,  6000  users,   2 country_rank union all
select 'rt.com'  site,  840 country ,  200   users,   3 country_rank union all
select 'rt.com'  site,  33  country ,  6000  users,   2 country_rank union all
select 'rt.com'  site,  400 country ,  10000 users,   1 country_rank;

select site, country_1, country_1_share,country_2,country_2_share
from (
  select site
  ,max(CASE WHEN country_rank = 1 THEN country END) AS country_1 
  ,max(CASE WHEN country_rank = 1 THEN users END) as country_1_share 
  ,max(CASE WHEN country_rank = 2 THEN country END) AS country_2
  ,max(CASE WHEN country_rank = 2 THEN users END) as country_2_share 
  from t1
  group by site
)s;



 OK
site    country_1       country_1_share country_2       country_2_share
cnn.com 840     10000   556     6000
rt.com  400     10000   33      6000