在PIG Latin中使用SUM()

时间:2014-12-04 14:15:17

标签: loops sum apache-pig

我刚开始在PIG中编写一些脚本,我正在尝试使用一个int列,我的脚本看起来像这样:

DATA = LOAD 'SomeFile' as (fingerPrint, size, str1, str2);
groupedChunks = GROUP DATA BY fingerPrint;


uniqueChunks = FILTER groupedChunks BY COUNT(DATA)==1;
sizes = FOREACH uniqueChunks GENERATE MAX($.size) as size;

现在我有一个表,只有一列,如果我愿意的话就是大小列 调用DESCRIBE,它会生成此输出:sizes:{size: int}

现在我需要这一步的帮助,如何获得本专栏所有尺寸的SUM?

2 个答案:

答案 0 :(得分:1)

你能试试吗?

result = FOREACH (GROUP sizes ALL) GENERATE SUM(sizes);
DUMP result;

更新:完整代码

<强> input.txt中

a       1       b       c
d       2       e       f

<强> PigScript:

DATA = LOAD 'input.txt' as (fingerPrint, size, str1, str2);
groupedChunks = GROUP DATA BY fingerPrint;
uniqueChunks = FILTER groupedChunks BY COUNT(DATA)==1;
sizes = FOREACH uniqueChunks GENERATE MAX(DATA.size) as size;
result = FOREACH (GROUP sizes ALL) GENERATE SUM(sizes);
DUMP result;

<强>输出:

(3.0)

答案 1 :(得分:0)

V = GROUP DATA ALL; result = FOREACH V GENERATE SUM(DATA.size)