需要将hive表中的数据合并为一行。目的是捕获'N'
以外的数据/值,即,对于所有'N'
值,应捕获'col1'
以外的任何值
表1:
col1 col2 col3 col4 col5 col6
-----------------------------
GHY BG Q N N N
GHY BG N T N N
GHY BG N N A N
GHY BG N N N Z
尝试以下查询:
Select col1, col2,array(
max(CASE WHEN col3 == 'Q' THEN 'Q' ELSE 'None' END),
max(CASE WHEN col4 == 'T' THEN 'T' ELSE 'None' END),
max(CASE WHEN col5 == 'A' THEN 'A' ELSE 'None' END),
max(CASE WHEN col6 == 'Z' THEN 'Z' ELSE 'None' END))
FROM table1 GROUP BY col1,col2;
并得到以下内容:
实际O / P:
GHY BG ['None','None','A','None']
预期的O / P:
GHY BG ['Q','T','A','Z']
无法理解错误点:(
Update_1:
从查询中删除“ max”后:
FAILED: SemanticException [Error 10025]: Line 2:11 Expression not in GROUP BY key 'Q'
Update_2:
select col1,col2,collect_set(col)
from (select col1,col2,t.col
from tbl
lateral view explode(array(col3,col4,col5,col6)) t as col
where t.col <> 'N'
) t
错误:
FAILED: SemanticException [Error 10025]: Line 1:7 Expression not in GROUP BY key 'col1'
答案 0 :(得分:2)
使用explode
为col1,col2的组合每列获取一行,并用collect_set
对其进行汇总。
select col1,col2,collect_set(col)
from (select col1,col2,t.col
from tbl
lateral view explode(array(col3,col4,col5,col6)) t as col
where t.col <> 'N'
) t
group by col1,col2
答案 1 :(得分:2)
此查询产生预期的结果:
with Table1 as --your test data
(
select stack(4,
'GHY','BG','Q','N','N','N',
'GHY','BG','N','T','N','N',
'GHY','BG','N','N','A','N',
'GHY','BG','N','N','N','Z') as (col1, col2, col3, col4, col5, col6)
)
select col1, col2,array(
nvl(max(CASE WHEN col3 = 'Q' THEN 'Q' END),'None'),
nvl(max(CASE WHEN col4 = 'T' THEN 'T' END),'None'),
nvl(max(CASE WHEN col5 = 'A' THEN 'A' END),'None'),
nvl(max(CASE WHEN col6 = 'Z' THEN 'Z' END),'None'))
from Table1
group by col1, col2;
结果:
GHY BG ["Q","T","A","Z"]
答案 2 :(得分:1)
另一种可能的解决方案(受所提供的启发)是:
Select col1,col2,array(concat(max(col3),max(col4),max(col5),max(col6)))
group by col1,col2;
注意:
max()
将选择最大值。因此,您可能需要将不需要的值更改为'aa'
。否则,可能会选择其他值。
示例1:
col1 col2 col3 col4 col5 col6
-----------------------------
GHY BG Q N N N
GHY BG N T N N
GHY BG N N A N
GHY BG N N N Z
结果:
['Q','T','N','Z']
示例2:
col1 col2 col3 col4 col5 col6
-----------------------------
GHY BG Q a a a
GHY BG a T a a
GHY BG a a A a
GHY BG a a a Z
结果:
['Q','T','A','Z']