从群组中删除空值?

时间:2013-02-03 06:29:05

标签: hadoop apache-pig

我有以下内容:

(id:int,names:chararray)

我按照ID分组,创建了一个名字。我看到在名字包中,可能有一个空值。如何从包中删除空值?

1 个答案:

答案 0 :(得分:1)

您可以使用嵌套在FOREACH中的FILTER从GROUP BY创建的包中删除元组。

inpt = LOAD '...' as (id: int, names: chararray);
grp = GROUP inpt BY id;
result = FOREACH grp {
   no_nulls = FILTER inpt BY names is not null;
  GENERATE group, no_nulls;
};

或者只是在分组前过滤空名称:

inpt = LOAD '...' as (id: int, names: chararray);
no_nulls = FILTER input BY names is not null;
grp = GROUP no_nulls BY id;