加入后的COUNT和PIG中的分组

时间:2017-06-06 21:37:16

标签: hdfs apache-pig

我是猪的新人,并试着理解为什么我不能在加入和分组后计算:

A = LOAD 'mary' as (line);
B = LOAD 'mary' as (line);

wordsA = foreach A generate flatten(TOKENIZE(line)) as wordA;
grpdA = group wordsA by wordA;
cntdA = foreach grpdA generate group, COUNT(wordsA);

wordsB = foreach B generate flatten(TOKENIZE(line)) as wordB;
grpdB = group wordsB by wordB;
cntdB = foreach grpdB generate group, COUNT(wordsB), 'some text';

fltB = FILTER cntdB BY $1>1;

jnd = join cntdA by $1, fltB by $1;
jnd_n = foreach jnd generate $0;
grp = group jnd by $0;
out = foreach grp generate group, count(jnd_n);

dump jnd_n;
dump grp;

dump jnd_n:

(was)
(was)
(was)
(lamb)
(lamb)
(lamb)
(Mary)
(Mary)
(Mary)

dump grp:

(was,{(was,2,was,2,some text),(was,2,Mary,2,some text),(was,2,lamb,2,some text)})
(Mary,{(Mary,2,was,2,some text),(Mary,2,Mary,2,some text),(Mary,2,lamb,2,some text)})
(lamb,{(lamb,2,was,2,some text),(lamb,2,Mary,2,some text),(lamb,2,lamb,2,some text)})

但我收到错误:

  

无效的标量投影:jnd_n:需要从中投射一列   它被用作标量的关系

如果我尝试更改代码:

out = foreach grp generate group, count(jnd_n.$0);

然后我又收到了另一个错误:

  

无法生成逻辑计划。嵌套异常:   org.apache.pig.backend.executionengine.ExecException:ERROR 1070:   无法使用导入来解决计数:[,java.lang。,   org.apache.pig.builtin。,org.apache.pig.impl.builtin。]

我知道我可以用另一种方式做到这一点,但我希望在完成两次猪操作之后得到这样的结果 JOIN GROUP BY

转出:

(was,3)
(lamb,3)
(Mary,3)

1 个答案:

答案 0 :(得分:0)

out = foreach grp generate group, COUNT(jnd_n.$0);` 需要加盖。 COUNT是一个关键字。

<div class="profile">
    <img src="http://lorempixel.com/g/720/720/nature" alt="" />
</div>