以下是导致我的问题的代码:
a = LOAD 'tellers' using TextLoader() AS line;
# convert a to charrarry
b = foreach a generate (chararray)line;
# run through my UDF to create tuples
c = foreach b generate myudfs.TellerParser5(line); # ({(20),(5),(5),(10)(1),(1),(1),(1),(1),(5),(10),(10),(10)})....
d = foreach c generate flatten(number);
e = group d by number; #{group: chararray,d: {(number: chararray)}}
f = foreach e generate group, COUNT(d); # f: {group: chararray,long}
在databag f中,我有一个空元组(,1)我想过滤/删除。
dump f;
(,1)
(1,97)
(5,49)
(10,87)
(20,24)
describe f;
f: {group: chararray,long}
我试过这个没有成功(没有改变):
remove_tuple = filter f BY group is not null;
答案 0 :(得分:0)
小组是猪keyword
。希望这可以在其他一些单词用于元组名称时起作用。
答案 1 :(得分:0)
可以使用!='null'
作为条件来过滤NULL。我在下面作为输入。
(,1)
(1,97)
(5,49)
(10,87)
(20,24)
以下是我们如何过滤NULL。
A = LOAD 'file' using PigStorage(',') AS (a:chararray,b:long);
B = FILTER A BY a!='null';
DUMP B;
因此,对于您的脚本,该行将类似于
remove_tuple = filter f BY group!='null';
输出:
(1,97)
(5,49)
(10,87)
(20,24)
答案 2 :(得分:0)
我通过添加一个步骤并将其转换为int来解决。以下是步骤:
e = foreach d generate (int)$0; # this is the key added step
f = group e by number; #{group: chararray,d: {(number: chararray)}}
g = foreach f generate group, COUNT(e); # f: {group: chararray,long}
h = foreach f generate group, SUM(e);
i = filter g by $0 is not null;
dump i;
(1,97)
(5,49)
(10,87)
(20,24)