配置单元中的数组操作(添加数组)

时间:2018-03-27 15:37:32

标签: arrays hive explode

我有一个hive表,列id(String),val(String)为:

function setCookies() {
  switch (arguments.length) {
    case 2:
      return setCookiesWithJustUriAndResponse(arguments[0], arguments[1]);
    case 4:
      return otherSetCookies(arguments[0], arguments[1], arguments[2], arguments[3]);
  }
}

我想按id列添加val列分组。 预期结果是:

id,val
abc,{0|1|0}
abc,{0|1|1}
abc,{1|0|1|1}

这个结果可以通过并行添加数组来获得。

我尝试使用横向视图爆炸然后转换为int等等。 但无法获得预期的结果。 我知道使用UDF也是一种选择,但是只有hive还有其他方法。

任何建议都会有所帮助。

由于

3 个答案:

答案 0 :(得分:1)

首先将{}替换为空格,split字符串,然后使用lateral viewposexplode来汇总相同位置的数字

select id,pos,sum(split_val) as total
from lateral view posexplode(split(regexp_replace(val,'[{}]',''),'\\|')) tbl as pos,split_val
group by id,pos

然后使用collect_list生成最终数组。

select id,collect_list(total)
from (select id,pos,sum(split_val) as total
      from lateral view posexplode(split(regexp_replace(val,'[{}]',''),'\\|')) tbl as pos,split_val
      group by id,pos
     ) t
group by id

答案 1 :(得分:1)

这是一种可能的方式,可能有更好的方法

headers

将其写在没有select * from tbl1; +----------+------------+--+ | tbl1.id | tbl1.val | +----------+------------+--+ | abc | {0|1|0} | | abc | {0|1|1} | | abc | {1|0|1|1} | +----------+------------+--+

的地方
{}

创建一个表来使用它

insert overwrite directory '/user/cloudera/tbl2' 
row format delimited fields terminated by ','
select id, substr(val,2,length(val)-2) as val2 from tbl1

使用create external table tbl3(id string, val array<int>) row format delimited fields terminated by ',' collection items terminated by '|' location '/user/cloudera/tbl2' +----------+------------+--+ | tbl3.id | tbl3.val | +----------+------------+--+ | abc | [0,1,0] | | abc | [0,1,1] | | abc | [1,0,1,1] | +----------+------------+--+

posexplode

结果

select id, collect_list(val) 
from (
  select id, sum(c) as val 
    from (
      select id, i, c from tbl3 
      lateral view posexplode(val) v1 as i, c 
    ) tbl 
  group by id, i
  ) tbl2 
group by id

答案 2 :(得分:0)

Hive table mytab:

+----------+------------+
|    id    |     val    |
+----------+------------+
|   abc    | {0|1|0}    |
|   abc    | {0|1|1}    |
|   abc    | {1|0|1|1}  |
+----------+------------+

预期产出:

+----------+------------+
|    id    |     val    |
+----------+------------+
|   abc    | {1|2|2|1}  |
+----------+------------+

使用的Hive查询:

select id,concat('{',concat_ws('|',(collect_list(cast(cast(expl_val_sum as int)as string)))),'}') as coll_expl_val 
from(
select id,index,sum(expl_val) as expl_val_sum
from mytab 
lateral view posexplode(split(regexp_replace(val,'[{}]',''),'\\|')) exp as index,expl_val
group by id,index)a
group by id;
1.First posexplode is used which explodes the array[String].
2.Then based on the index column the array values are added up parallelly.
3.Then cast as int is used to convert from decimal values to integer.
4.Then cast as String and then again converted to array[string] using collect_list.
5.Next the values of array are '|' delimited using concat_ws function.
6.Next concat function is used to append '{' and '}'.

感谢您的所有回复。