我刚开始在PIG中编写一些脚本,我正在尝试使用一个int列,我的脚本看起来像这样:
DATA = LOAD 'SomeFile' as (fingerPrint, size, str1, str2);
groupedChunks = GROUP DATA BY fingerPrint;
uniqueChunks = FILTER groupedChunks BY COUNT(DATA)==1;
sizes = FOREACH uniqueChunks GENERATE MAX($.size) as size;
现在我有一个表,只有一列,如果我愿意的话就是大小列
调用DESCRIBE,它会生成此输出:sizes:{size: int}
现在我需要这一步的帮助,如何获得本专栏所有尺寸的SUM?
答案 0 :(得分:1)
你能试试吗?
result = FOREACH (GROUP sizes ALL) GENERATE SUM(sizes);
DUMP result;
更新:完整代码
<强> input.txt中强>
a 1 b c
d 2 e f
<强> PigScript:强>
DATA = LOAD 'input.txt' as (fingerPrint, size, str1, str2);
groupedChunks = GROUP DATA BY fingerPrint;
uniqueChunks = FILTER groupedChunks BY COUNT(DATA)==1;
sizes = FOREACH uniqueChunks GENERATE MAX(DATA.size) as size;
result = FOREACH (GROUP sizes ALL) GENERATE SUM(sizes);
DUMP result;
<强>输出:强>
(3.0)
答案 1 :(得分:0)
V = GROUP DATA ALL; result = FOREACH V GENERATE SUM(DATA.size)