Pig Job无法从hdfs读取数据(错误:1066)

时间:2015-08-25 17:34:21

标签: hadoop apache-pig

我目前正在尝试实施Binary Pig(有关详细信息,请参阅https://github.com/endgameinc/binarypig)群集使用Hadoop和Pig分析恶意软件二进制文件。我使用Cloudera CDH安装Hadoop和Pig。

我的猪脚本如下:

SET debug 'on';

register '/home/myuser/binarypig-1.0-SNAPSHOT-jar-with-dependencies.jar';

SET mapred.cache.files /tmp/scripts#scripts;
SET mapred.create.symlink yes;

%default INPUT 'hdfs://namenode1:8020/bla/test/malware.archive.seq'
%default TIMEOUT_MS '180000'
%default USE_DEVSHM 'true'

data = load '$INPUT' using com.endgame.binarypig.loaders.ExecutingTextLoader('scripts/strings.sh',   '$TIMEOUT_MS', '$USE_DEVSHM');
DUMP data;

bash脚本strings.sh只是执行unix" string"命令收集malware.archive.seq容器中每个文件的所有字符串。我在我的namenode上运行脚本:

pig -f strings.pig

由于某种原因,我的作业总是失败,并显示以下错误消息:

Failed!

Failed Jobs:
JobId   Alias   Feature Message Outputs
job_1440074864855_0058  data    MAP_ONLY    Message: Job failed!        hdfs://namenode1:8020/tmp/temp-362821719/tmp-171792164,

Input(s):
Failed to read data from "hdfs://namenode1:8020/bla/test/malware.zip.seq"

Output(s):
Failed to produce result in "hdfs://namenode1:8020/tmp/temp-362821719/tmp- 171792164"

Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0

Job DAG:
job_1440074864855_0058

2015-08-25 17:07:21,616 [main] INFO    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher -  Failed!
2015-08-25 17:07:21,616 [main] DEBUG org.apache.pig.impl.io.InterStorage -  Pig Internal storage in use
2015-08-25 17:07:21,622 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias data

文件 hdfs:// namenode1:8020 / bla / test / malware.zip.seq 确实存在且权限设置为777只是为了排除权限错误。

由于我的猜测是它与pig脚本中的load命令有关,这里是load命令的调试消息:

2015-08-25 17:07:06,639 [main] DEBUG org.apache.pig.parser.QueryParserDriver - Original macro AST:
 (QUERY (STATEMENT data (load 'hdfs://namenode1:8020/bla/test/malware.zip.seq' (FUNC com . endgame . binarypig  . loaders . ExecutingTextLoader 'scripts/strings.sh' '180000' 'true'))))

2015-08-25 17:07:06,640 [main] DEBUG org.apache.pig.parser.QueryParserDriver - macro AST after import:
(QUERY (STATEMENT data (load 'hdfs://namenode1:8020/bla/test/malware.zip.seq' (FUNC com . endgame . binarypig . loaders . ExecutingTextLoader 'scripts/strings.sh' '180000' 'true'))))

2015-08-25 17:07:06,640 [main] DEBUG org.apache.pig.parser.QueryParserDriver - Resulting macro AST:
(QUERY (STATEMENT data (load 'hdfs://namenode1:8020/bla/test/malware.zip.seq' (FUNC com . endgame . binarypig . loaders . ExecutingTextLoader 'scripts/strings.sh' '180000' 'true'))))

2015-08-25 17:07:06,961 [main] DEBUG org.apache.pig.parser.QueryParserDriver - Original macro AST:
(QUERY (STATEMENT data (load 'hdfs://namenode1:8020/bla/test/malware.zip.seq' (FUNC com . endgame . binarypig . loaders . ExecutingTextLoader 'scripts/strings.sh' '180000' 'true'))))

2015-08-25 17:07:06,961 [main] DEBUG org.apache.pig.parser.QueryParserDriver - macro AST after import:
(QUERY (STATEMENT data (load 'hdfs://namenode1:8020/bla/test/malware.zip.seq' (FUNC com . endgame . binarypig . loaders . ExecutingTextLoader 'scripts/strings.sh' '180000' 'true'))))

2015-08-25 17:07:06,961 [main] DEBUG org.apache.pig.parser.QueryParserDriver - Resulting macro AST:
(QUERY (STATEMENT data (load 'hdfs://namenode1:8020/bla/test/malware.zip.seq' (FUNC com . endgame . binarypig . loaders . ExecutingTextLoader 'scripts/strings.sh' '180000' 'true'))))

有没有人知道如何解决这个问题,甚至如何调试它?

编辑(已添加pig_log):

Pig Stack Trace
---------------
ERROR 1066: Unable to open iterator for alias data

org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias data
    at org.apache.pig.PigServer.openIterator(PigServer.java:892)
    at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:774)
    at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:372)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:198)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:173)
    at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
    at org.apache.pig.Main.run(Main.java:478)
    at org.apache.pig.Main.main(Main.java:156)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.io.IOException: Job terminated with anomalous status FAILED
    at org.apache.pig.PigServer.openIterator(PigServer.java:884)
    ... 13 more
     ================================================================================

0 个答案:

没有答案