使用PIG对多个列求和

时间:2015-07-16 19:30:04

标签: hadoop sum apache-pig

我有多个具有相同列的文件,我正在尝试使用SUM聚合两列中的值。

列结构在

之下
ID first_count second_count name desc
1  10          10           A    A_Desc
1  25          45           A    A_Desc
1  30          25           A    A_Desc
2  20          20           B    B_Desc
2  40          10           B    B_Desc

如何将first_count和second_count相加?

ID first_count second_count name desc
1  65          80           A    A_Desc
2  60          30           B    B_Desc

下面是我写的脚本,但是当我执行它时,我得到一个错误“无法推断SUM的匹配函数,因为它们都不适合。请使用显式转换。

A = LOAD '/output/*/part*' AS (id:chararray,first_count:chararray,second_count:chararray,name:chararray,desc:chararray);
B = GROUP A BY id;

C = FOREACH B GENERATE group as id,
              SUM(A.first_count) as first_count,
              SUM(A.second_count) as second_count,
              A.name as name,
              A.desc as desc;

1 个答案:

答案 0 :(得分:1)

您的加载声明错误。 first_count,second_count被加载为chararray。 Sum不能添加两个字符串。如果您确定这些列只接受数字,那么将它们作为int加载。试试这个 -

A = LOAD '/output/*/part*' AS (id:chararray,first_count:int,second_count:int,name:chararray,desc:chararray);

它应该有用。