我有一组数据,其中一列由具有字符串数据类型的列表组成。
Column_A|Column_B
AAA |1 23 56 89 74 52
BBB |63 99 44 2 80 87 58 63
CCC |96 45 23 84 62 74
在这里,在以上数据中,我需要在B列中添加如下值:
Column_A|Column_B |Column_C
AAA |1 23 56 89 74 52 |295
BBB |63 99 44 2 80 87 58 63|496
CCC |96 45 23 84 62 74 |384
我使用了强制转换函数,并使用以下查询将数据类型从字符串转换为整数。
select Column_A,cast (Column_B as INT) as Column_B from Xyz
但是,总结价值是一个巨大的挑战。 有人可以帮我吗?
我也在学习RegEx。是否可以使用RegEx?
答案 0 :(得分:1)
Explode
您的列使用split
(按空格和汇总)。
这是Hive中的演示:
with your_data as
(
select Column_A,Column_B from
(
select stack(3,
'AAA','1 23 56 89 74 52',
'BBB','63 99 44 2 80 87 58 63',
'CCC','96 45 23 84 62 74'
) as (Column_A,Column_B)
)s
) --Use your table instead of this CTE
select Column_A,Column_B, sum(cast(b.val_b as int)) as Column_C
from your_data a
lateral view outer explode(split(Column_B,' ')) b as val_b
group by Column_A,Column_B;
结果:
OK
AAA 1 23 56 89 74 52 295
BBB 63 99 44 2 80 87 58 63 496
CCC 96 45 23 84 62 74 384
Time taken: 53.228 seconds, Fetched: 3 row(s)
或者,如果列表中元素的最大数量是固定的,则可以不爆炸而执行相同的操作,这样会更快:
create temporary macro cast_value(s string) nvl(cast(s as int),0);
with your_data as
(
select Column_A,Column_B from
(
select stack(3,
'AAA','1 23 56 89 74 52',
'BBB','63 99 44 2 80 87 58 63',
'CCC','96 45 23 84 62 74'
) as (Column_A,Column_B)
)s
) --Use your table instead of this CTE
select Column_A,Column_B,
cast_value(col_B_array[0])+
cast_value(col_B_array[1])+
cast_value(col_B_array[2])+
cast_value(col_B_array[3])+
cast_value(col_B_array[4])+
cast_value(col_B_array[5])+
cast_value(col_B_array[6])+
cast_value(col_B_array[7])+
cast_value(col_B_array[8])+
cast_value(col_B_array[9]) as Column_C
from(
select Column_A,Column_B, split(Column_B,' ') col_B_array
from your_data a
)s
结果:
OK
AAA 1 23 56 89 74 52 295
BBB 63 99 44 2 80 87 58 63 496
CCC 96 45 23 84 62 74 384
Time taken: 0.82 seconds, Fetched: 3 row(s)