无法在Apache Pig中过滤NULL值

时间:2017-04-28 11:36:55

标签: hadoop apache-pig

我正在尝试从Pig中的CSV文件中过滤NULL和空字段。我使用CSVExcel存储来加载数据并删除标题。下面是我尝试过的猪脚本。

REGISTER /usr/lib/pig/piggybank.jar;
inp = load 'test.csv' USING org.apache.pig.piggybank.storage.CSVExcelStorage(',','YES_MULTILINE','NOCHANGE','SKIP_INPUT_HEADER');
a = foreach inp generate (INT)$0 as id, (CHARARRAY)$1 as name, (CHARARRAY)$2 as dept;
b = filter a by (id is not null) AND (name is not null) AND NOT(name MATCHES '') AND (dept is not null) ;

示例输入:

id,name,dept

1,Avy,NULL

2,,CS

3,Sam,Mech

在我执行转储b 后,下面是输出。

(1,Avy,NULL)

(3,Sam,Mech)

理想情况下,我也不想要第一条记录,因为它包含NULL。有人可以建议吗?

1 个答案:

答案 0 :(得分:1)

最后,这对我有用!

b = filter a by (id is not null) AND (name is not null) AND NOT(name MATCHES '') AND (dept!= 'NULL');

谢谢,伙计们!