如何用猪群分组BagToTuple?

时间:2014-08-14 22:44:07

标签: apache-pig

我将数据设置为

id company
1  a
1  b
2  c
2  a

我编写的代码如下: record = load ....

grp = GROUP record BY id; 

newdata = FOREACH grp GENERATE group AS id, 
        COUNT(record) AS counts, 
        BagToTuple(record.company) AS company;

输出如下:

id count company
1  2     a,b
2  2     c,a

但我希望公司可以排序。例如,我需要a,c代表id 2。

1 个答案:

答案 0 :(得分:0)

使用嵌套的Foreach

newdata = FOREACH grp {
        sortedbag = order record by company;
        GENERATE group AS id,
        COUNT(sortedbag) AS counts,
        BagToTuple(sortedbag.company) AS company;
        };

sortedbag别名包含按ASCENDING顺序按公司排序的数据。如果要在DESCENDING中排序,请将语句更改为

sortedbag = order record by company DESC;