以下SQL查询的等效脚本应该是什么:
SELECT fld1, fld2, fld3, SUM(fld4)
FROM Table1
GROUP BY fld1, fld2, fld3;
对于表1:
A B C 2 X Y Z
A B C 3 X Y Z
A B D 2 X Y Z
A C D 2 X Y Z
A C D 2 X Y Z
A C D 2 X Y Z
输出:
A B C 5
A B D 2
A C D 6
答案 0 :(得分:0)
参考:https://pig.apache.org/docs/r0.11.1/basic.html#GROUP,你可以 找到一个多组示例
对于下面的用例,代码应该就够了
A = load 'input.csv' using PigStorage(',') AS (fld1:chararray,fld2:chararray,fld3:chararray,fld4:long,fld5:chararray,fld6:chararray,fld7:chararray);
B = FOREACH(GROUP A BY (fld1,fld2,fld3)) GENERATE FLATTEN(group) AS (fld1,fld2,fld3), SUM(A.fld4) AS fld4_aggr;
DUMP B;