我需要将下面的数据分组到fname和lastname。
(FNAME,L-NAME,ID)
abc,xyz,I
abc,xyz,N
ppp,xxx,I
ppp,XXX,I
在id字段中我只期望2个值,即N或I所以如果我得到N和I同样的fname,lname组合那么我应该使用id作为N else需要使用id作为id字段的值,因为它在小组。
我期待以下结果:
abc,xyz,N
ppp,xxx,I
我试过下面的代码并且工作正常
in =load '/testing/name.txt' USING PigStorage(',') as (fname:chararray,lname:chararray,id:chararray);
grp = group in by (fname,lname);
z = foreach grp generate FLATTEN(group) AS (fname,lname),(COUNT(in.id) >1 ? ('N') :BagToTuple(in.id))as id;
但是现在我需要检查id字段的值而不是计数:
z = foreach grp generate FLATTEN(group) AS (fname,lname),((in.id == 'N' or in.id == 'I') ? ('N') :BagToTuple(in.id))as id;
然而它给出了以下错误:
(Name: Equal Type: null Uid: null)incompatible types in Equal Operator left hand side:bag :tuple(id:chararray) right hand side:chararray
然而它给出了以下错误:
Two inputs of BinCond must have compatible schemas. left hand side: #31:tuple(#32:chararray) right hand side: org.apache.pig.builtin.bagtotuple_3#35:tuple(id#36:int)
请指导
答案 0 :(得分:0)
您正在加载包含字符的字段,即N,I到int列?更改id列类型为chararray的load语句。
in =load '/testing/name.txt' USING PigStorage(',') as (fname:chararray,lname:chararray,id:chararray);
grp = group in by (fname,lname);
z = foreach grp generate FLATTEN(group) AS (fname,lname),(COUNT(in.id) > 1 && in.id matches 'N') ? ('N') : in.id;