Pig 10过滤器chararray by not null不起作用

时间:2014-12-12 22:00:54

标签: filter apache-pig

我是猪的新手,我正在玩它并走到路障。

想象一下,我有以下内容:

dump test;

(1,2014-04-08 12:09:23.0)
(2,2014-04-08 12:09:23.0)
(3,null)
(4,null)

我想过滤“test”以删除空值,所以我会这样做:

filter_test = filter test by test.column2 is not null;

给我这样的东西:

(1,2014-04-08 12:09:23.0)
(2,2014-04-08 12:09:23.0)

但它返回同样的东西。它不会删除空行。

我正在使用Pig 10,日期列的类型为chararray。

感谢您的帮助。

1 个答案:

答案 0 :(得分:1)

你的column2没有空值,它是一个chararray。请查看实际空值的示例,并将其视为chararray。

示例1:null为chararray
input.txt中

1,2014-04-08 12:09:23.0
2,2014-04-08 12:09:23.0
3,null
4,null

<强>猪:

A = LOAD 'input.txt' USING PigStorage(',') AS (f1:int,f2:chararray);
B = FILTER A BY f2!='null';
DUMP B;

<强>输出:

(1,2014-04-08 12:09:23.0)
(2,2014-04-08 12:09:23.0)

示例2:实际空值
input.txt

1,2014-04-08 12:09:23.0
2,2014-04-08 12:09:23.0
3,
4,

<强>猪:

A = LOAD 'input.txt' USING PigStorage(',') AS (f1:int,f2:chararray);
B = FILTER A BY f2 is not null;
DUMP B;

<强>输出:

(1,2014-04-08 12:09:23.0)
(2,2014-04-08 12:09:23.0)