Foreach之后过滤器无法正常工作

时间:2013-12-18 23:03:41

标签: java hadoop apache-pig cloudera

由于某种原因,在下面的语句中添加过滤器会导致一些错误。在控制台输出中,我找到Failed to read data from "..."。在日志中我发现了这个:

Backend error message
---------------------
java.lang.NullPointerException
    at org.apache.pig.builtin.Utf8StorageConverter.consumeTuple(Utf8StorageConverter.java:185)
    at org.apache.pig.builtin.Utf8StorageConverter.consumeBag(Utf8StorageConverter.java:94)
    at org.apache.pig.builtin.Utf8StorageConverter.bytesToBag(Utf8StorageConverter.java:331)
    at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:1562)
    at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:334)
    at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.processInput(POUserFunc.java:228)
    at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:282)
    at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:416)
    at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:3

Pig Stack Trace
---------------
ERROR 1066: Unable to open iterator for alias limited

org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias limited
    at org.apache.pig.PigServer.openIterator(PigServer.java:838)
    at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:696)
    at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:320)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:194)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:170)
    at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
    at org.apache.pig.Main.run(Main.java:604)
    at org.apache.pig.Main.main(Main.java:157)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
Caused by: java.io.IOException: Couldn't retrieve job.
    at org.apache.pig.PigServer.store(PigServer.java:902)
    at org.apache.pig.PigServer.openIterator(PigServer.java:813)
    ... 12 more

我正在使用的代码如下:

--- Read the input 
records = LOAD 'data' AS (id1, id2, link, tags:bag{}, dates); 

counted = FOREACH records GENERATE (chararray) id1, (int) COUNT(tags) as amountOfTags;

filtered = FILTER counted BY amountOfTags > 0;

limited = limit filtered 10;

--- Save the result 
dump limited;

一切正常,直到我添加filtered...行并尝试输出它。

谁能告诉我为什么?

0 个答案:

没有答案