当我在我的脚本中将大型目录树的根指定为LOAD输入时,Pig会神秘地失败。它抛出的后端错误异常不能提供对发生的事情的深入了解。当文件较少时,相同的脚本可以正常工作。
这是一个非常简单的脚本,如下所示:
SET pig.noSplitCombination true;
raw_record = LOAD '/data/directory/tree/root' USING PigStorage(',');
filtered = FILTER raw_record by $1 == 251068;
filtered_data = FOREACH filtered GENERATE (chararray)$0, (chararray)$1, (chararray)$2;
STORE filtered_data INTO '/data/output/directory/' USING PigStorage();
这是我看到的错误消息:
ERROR 2244: Job scope-594 failed, hadoop does not return any error message
org.apache.pig.backend.executionengine.ExecException: ERROR 2244: Job scope-594 failed, hadoop does not return any error message
at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:178)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:232)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:203)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:81)
at org.apache.pig.Main.run(Main.java:608)
at org.apache.pig.Main.main(Main.java:156)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
PIG可以同时处理多少个文件?
答案 0 :(得分:0)
Pig可以处理任意数量的文件,但在处理方面没有限制。在您的情况下,尝试在加载时为每个字段提供数据类型,并在FILTER语句中尝试使用引号。
raw_record = LOAD'/ data / directory / tree / root'使用PigStorage(',')为(col1:chararray,col2:chararray;
过滤= FILTER raw_record $ 1 =='251068';
如果您仍然遇到错误,请尝试提供示例数据