通过在Hive中的同一表的子集上执行表的多个联接来尝试进行转置

时间:2019-02-07 14:27:15

标签: join hive transpose

我试图通过对同一表的子集执行表date的多个联接来对列data_A执行转置:

下面是创建我的测试数据集的代码,其中包含count的每个值的重复记录:

create table database.data_A (member_id string, x1 int, x2 int, count int, date date);
insert into table database.data_A
select 'A0001',1, 10, 1, '2017-01-01' 
union all
select 'A0001',1, 10, 2, '2017-07-01'
union all
select 'A0001',2, 20, 1, '2017-01-01'
union all
select 'A0001',2, 20, 2, '2017-07-01'
union all
select 'B0001',3, 50, 1, '2017-03-01'
union all
select 'C0001',4, 100, 1, '2017-04-01'
union all
select 'D0001',5, 200, 1, '2017-10-01' 
union all
select 'D0001',5, 200, 2, '2017-11-01'
union all
select 'D0001',5, 200, 3, '2017-12-01'
union all
select 'D0001',6, 500, 1, '2017-10-01'
union all
select 'D0001',6, 500, 2, '2017-11-01'
union all
select 'D0001',6, 500, 3, '2017-12-01'
union all
select 'D0001',7, 1000, 1, '2017-10-01'
union all
select 'D0001',7, 1000, 2, '2017-11-01'
union all
select 'D0001',7, 1000, 3, '2017-12-01';

我想将数据转置为此:

member_id x1 x2 date1 date2 date3     
'A0001', 1, 10, '2017-01-01' '2017-07-01' . 
'A0001', 2, 20, '2017-01-01' '2017-07-01' .
'B0001', 3, 50, '2017-03-01' . . 
'C0001', 4, 100, '2017-04-01' . . 
'D0001', 5, 200, '2017-10-01' '2017-11-01' '2017-12-01'
'D0001', 6, 500, '2017-10-01' '2017-11-01' '2017-12-01'
'D0001', 7, 1000, '2017-10-01' '2017-11-01' '2017-12-01'

我的第一个程序(不成功):

create table database.data_B as 
select a.member_id, a.x1, a.x2, a.date_1, b.date_2, c.date_3
from (select member_id, x1, x2, date as date_1 from database.data_A where count=1) as a
left join
(select member_id, date as date_2 from database.data_A where count=2) as b
on (a.member_id=b.member_id)
left join 
(select member_id, date as date_3 from database.data_A where count=3) as c
on (a.member_id=c.member_id);

1 个答案:

答案 0 :(得分:1)

下面将完成工作。

 select
 member_id,
 x1,
 x2,
 max(case when count=1 then date1 else '.' end) as date11,
 max(case when count=2 then date1 else '.' end) as date2,
 max(case when count=3 then date1 else '.' end) as date3
 from data_A
 group by member_id,x1, x2