结构列上的聚合Hive

时间:2017-06-21 21:22:01

标签: arrays hadoop struct hive bigdata

我有一个struct数组,我正在尝试查找struct column的count,sum,distinct值。

create table temp (regionkey smallint, name string, comment string, nations array<struct<n_nationkey:smallint,n_name:string,n_comment:string>>) 
ROW FORMAT DELIMITED 
FIELDS TERMINATED BY '|' 
MAP KEYS TERMINATED BY ',';

当我尝试运行查询时

select name, 
count(nations.n_nationkey) as count, 
sum(nations.n_nationkey) as sum, 
ndv(nations.n_nationkey) as distinct_val 
from temp 
group by name 
order by name;

我收到错误

FAILED: UDFArgumentTypeException Only primitive type arguments are accepted but array<smallint> is passed.

我想要做的是找到n_nationkey的计数,总和和不同的值。

任何帮助都将受到高度赞赏。

1 个答案:

答案 0 :(得分:1)

select      t.name 
           ,count   (e.col.n_nationkey)             as count 
           ,sum     (e.col.n_nationkey)             as sum
           ,count   (distinct e.col.n_nationkey)    as distinct_val 

from        temp t lateral view explode (t.nations) e

group by    t.name 

order by    t.name
;

对于OP

带别名的相同解决方案。
nations不是结构。它是结构的数组 它没有n_nationkey属性。它具有具有n_nationkey属性的struct元素 explode函数采用结构数组(nations)并将每个结构(nation)返回到单独的行中。

select      t.name 
           ,count   (e.nation.n_nationkey)             as count 
           ,sum     (e.nation.n_nationkey)             as sum
           ,count   (distinct e.nation.n_nationkey)    as distinct_val 

from        temp t lateral view explode (t.nations) e as nation

group by    t.name 

order by    t.name
;