蜂巢多选

时间:2018-10-29 09:20:24

标签: hive

CID    F_ID       NME
1       A          QR
1       B          QB
2       A          QR
3       B          QB
4       A          QR
4       B          QB

结果:-

CID F_ID    NME
1    A      QR
1    B      QB
4    A      QR
4    B      QB

在Hive中,要获取结果的查询仅应得出CID 都属于F_ID-A和B,我可以在oracle中使用LISTAGG达到相同的效果

1 个答案:

答案 0 :(得分:1)

此查询将在单个map-reduce阶段执行:

select CID,    F_ID,       NME  from
(
select s.*, 
       sum(A) over (partition by CID) A_cnt,
       sum(B) over (partition by CID) B_cnt
from
(
select s.*,
       case when F_ID='A' then 1 else 0 end A,
       case when F_ID='B' then 1 else 0 end B
 from your_table
)s
)s where A_cnt>=1 and B_cnt >=1
;

演示:

select CID,    F_ID,       NME  from
(
select s.*, 
       sum(A) over (partition by CID) A_cnt,
       sum(B) over (partition by CID) B_cnt
from
(
select s.*,
       case when F_ID='A' then 1 else 0 end A,
       case when F_ID='B' then 1 else 0 end B
 from
( --replace this subquery (s) with your table  
  select stack(6,
  1,       'A',          'QR',
  1,       'B',          'QB',
  2,       'A',          'QR',
  3,       'B',          'QB',
  4,       'A',          'QR',
  4,       'B',          'QB') as (CID,    F_ID,       NME)
) s
)s
)s where A_cnt>=1 and B_cnt >=1
;

结果:

MapReduce Jobs Launched:
Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 6.39 sec   HDFS Read: 13549 HDFS Write: 28 SUCCESS
Total MapReduce CPU Time Spent: 6 seconds 390 msec
OK
1       B       QB
1       A       QR
4       B       QB
4       A       QR
Time taken: 108.779 seconds, Fetched: 4 row(s)