我有多个具有相同列的文件,我正在尝试使用SUM聚合两列中的值。
列结构在
之下ID first_count second_count name desc
1 10 10 A A_Desc
1 25 45 A A_Desc
1 30 25 A A_Desc
2 20 20 B B_Desc
2 40 10 B B_Desc
如何将first_count和second_count相加?
ID first_count second_count name desc
1 65 80 A A_Desc
2 60 30 B B_Desc
下面是我写的脚本,但是当我执行它时,我得到一个错误“无法推断SUM的匹配函数,因为它们都不适合。请使用显式转换。
A = LOAD '/output/*/part*' AS (id:chararray,first_count:chararray,second_count:chararray,name:chararray,desc:chararray);
B = GROUP A BY id;
C = FOREACH B GENERATE group as id,
SUM(A.first_count) as first_count,
SUM(A.second_count) as second_count,
A.name as name,
A.desc as desc;
答案 0 :(得分:1)
您的加载声明错误。 first_count,second_count被加载为chararray。 Sum不能添加两个字符串。如果您确定这些列只接受数字,那么将它们作为int加载。试试这个 -
A = LOAD '/output/*/part*' AS (id:chararray,first_count:int,second_count:int,name:chararray,desc:chararray);
它应该有用。