猪过滤器不工作

时间:2017-02-10 06:36:21

标签: hadoop mapreduce apache-pig

我有以下猪脚本,

meta_file = LOAD 'meta_file' USING PigStorage(',');

DUMP meta_file;

meta = FOREACH meta_file GENERATE (chararray)$0 AS is_vta:chararray, (chararray)$1 AS id:long;

DUMP meta;

new_d = FILTER meta BY (is_vta == 't');
DUMP new_d;

meta_file的内容:

"t","7181397"
"t","6331589"
"f","7266217"
"t","6051440"
"t","6901437"
"t","6805292"
"f","7144764"
"t","6820265"
"f","7515321"
"t","4777938"

meta_file的DUMP完全没问题且与文件内容相同,因此meta的内容也是如此,但new_d为空。我可以看到is_vtameta的值为t,但new_d仍为空。为什么没有正确过滤元素?我在这做错了什么?我是Pig Latin的新手,我无法弄清楚这里可能出现的问题。

感谢您的帮助。

2 个答案:

答案 0 :(得分:1)

我认为引用引起了问题:在这里处理它们的两种方法

1:使用piggybank处理报价:休息你的报价应该有效。

REGISTER 'piggybank.jar'  -- > this jar handles quotes by default. 

A = LOAD 'fil.csv'  using org.apache.pig.piggybank.storage.CSVExcelStorage(',') as (---Your Schema --- );

或 2:

使用正则表达式和修剪引号。 Remove single quotes from data using Pig

答案 1 :(得分:1)

简单的方法:

new_d = FILTER meta BY is_vta MATCHES '.*t.*';

另一种解决方案:

remquotes = FOREACH meta GENERATE REPLACE($0, '\\"', '') AS is_vta:chararray, id;

new_d = FILTER remquotes BY is_vta == 't';