如何在猪脚本中的多列上分组

时间:2017-06-21 23:02:32

标签: apache-pig

以下SQL查询的等效脚本应该是什么:

SELECT fld1, fld2, fld3, SUM(fld4)
FROM Table1
GROUP BY fld1, fld2, fld3;

对于表1:

A    B   C  2    X   Y   Z
A    B   C  3    X   Y   Z
A    B   D  2    X   Y   Z
A    C   D  2    X   Y   Z
A    C   D  2    X   Y   Z
A    C   D  2    X   Y   Z

输出:

A    B   C  5           
A    B   D  2           
A    C   D  6           

1 个答案:

答案 0 :(得分:0)

  

参考https://pig.apache.org/docs/r0.11.1/basic.html#GROUP,你可以   找到一个多组示例

对于下面的用例,代码应该就够了

A = load 'input.csv' using PigStorage(',')  AS (fld1:chararray,fld2:chararray,fld3:chararray,fld4:long,fld5:chararray,fld6:chararray,fld7:chararray);
B = FOREACH(GROUP A BY (fld1,fld2,fld3)) GENERATE FLATTEN(group) AS (fld1,fld2,fld3), SUM(A.fld4) AS fld4_aggr;
DUMP B;