如何在猪的群体功能中使用bincode操作符

时间:2016-11-08 09:17:57

标签: hadoop apache-pig

我需要将下面的数据分组到fname和lastname。

(FNAME,L-NAME,ID)

abc,xyz,I
abc,xyz,N
ppp,xxx,I
ppp,XXX,I

在id字段中我只期望2个值,即N或I所以如果我得到N和I同样的fname,lname组合那么我应该使用id作为N else需要使用id作为id字段的值,因为它在小组。

我期待以下结果:

abc,xyz,N
ppp,xxx,I

我试过下面的代码并且工作正常

in =load '/testing/name.txt' USING PigStorage(',') as (fname:chararray,lname:chararray,id:chararray);

grp = group in by (fname,lname);

z = foreach grp generate FLATTEN(group) AS (fname,lname),(COUNT(in.id) >1 ? ('N') :BagToTuple(in.id))as id;

但是现在我需要检查id字段的值而不是计数:

z = foreach grp generate FLATTEN(group) AS (fname,lname),((in.id == 'N' or in.id == 'I') ? ('N') :BagToTuple(in.id))as id;

然而它给出了以下错误:

(Name: Equal Type: null Uid: null)incompatible types in Equal Operator left hand side:bag :tuple(id:chararray)  right hand side:chararray

然而它给出了以下错误:

Two inputs of BinCond must have compatible schemas. left hand side: #31:tuple(#32:chararray) right hand side: org.apache.pig.builtin.bagtotuple_3#35:tuple(id#36:int)

请指导

1 个答案:

答案 0 :(得分:0)

您正在加载包含字符的字段,即N,I到int列?更改id列类型为chararray的load语句。

in =load '/testing/name.txt' USING PigStorage(',') as (fname:chararray,lname:chararray,id:chararray);
grp = group in by (fname,lname);
z = foreach grp generate FLATTEN(group) AS (fname,lname),(COUNT(in.id) > 1 && in.id matches 'N') ? ('N') : in.id;