我试图提供汇总最后两个字段(计数和书籍)的输出,并为每个分组将它们(计数/书籍)分开。目前我有分组代码,它按数组中的第一个元素分组。我不知道如何获得最后两个元素的总和然后总结它们。我已经发布了到目前为止我的代码。提前谢谢!
bigrams = LOAD 'txt' AS (bigram:chararray, year:int, count:int, books:int);
grouping = group bigrams by bigram;
STORE grouping INTO 's3://cse6242vrv3/output1.txt';
答案 0 :(得分:1)
您对输出的期望并不完全清楚。所以,我假设您只想知道如何在Pig中进行聚合。如果您正在寻找不同的东西,请告诉我们更多。
bigrams = LOAD 'txt' AS (bigram:chararray, year:int, count:int, books:int);
grouping = foreach(group bigrams by bigram) generate group AS biagram,
SUM(bigrams.count) AS sum_count,
SUM(biagram.books) AS sum_books,
SUM(bigrams.count)/SUM(biagram.books) AS ratio;
STORE grouping INTO 's3://cse6242vrv3/output1.txt';
您可以在此处找到有关猪聚集的更多详情 - https://pig.apache.org/docs/r0.15.0/basic.html#group 您可能对pig感兴趣的另一件事是嵌套块,可用于group by的复杂计算。 https://pig.apache.org/docs/r0.15.0/basic.html#nestedblock