我目前正在尝试实施Binary Pig(有关详细信息,请参阅https://github.com/endgameinc/binarypig)群集使用Hadoop和Pig分析恶意软件二进制文件。我使用Cloudera CDH安装Hadoop和Pig。
我的猪脚本如下:
SET debug 'on';
register '/home/myuser/binarypig-1.0-SNAPSHOT-jar-with-dependencies.jar';
SET mapred.cache.files /tmp/scripts#scripts;
SET mapred.create.symlink yes;
%default INPUT 'hdfs://namenode1:8020/bla/test/malware.archive.seq'
%default TIMEOUT_MS '180000'
%default USE_DEVSHM 'true'
data = load '$INPUT' using com.endgame.binarypig.loaders.ExecutingTextLoader('scripts/strings.sh', '$TIMEOUT_MS', '$USE_DEVSHM');
DUMP data;
bash脚本strings.sh只是执行unix" string"命令收集malware.archive.seq容器中每个文件的所有字符串。我在我的namenode上运行脚本:
pig -f strings.pig
由于某种原因,我的作业总是失败,并显示以下错误消息:
Failed!
Failed Jobs:
JobId Alias Feature Message Outputs
job_1440074864855_0058 data MAP_ONLY Message: Job failed! hdfs://namenode1:8020/tmp/temp-362821719/tmp-171792164,
Input(s):
Failed to read data from "hdfs://namenode1:8020/bla/test/malware.zip.seq"
Output(s):
Failed to produce result in "hdfs://namenode1:8020/tmp/temp-362821719/tmp- 171792164"
Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
Job DAG:
job_1440074864855_0058
2015-08-25 17:07:21,616 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed!
2015-08-25 17:07:21,616 [main] DEBUG org.apache.pig.impl.io.InterStorage - Pig Internal storage in use
2015-08-25 17:07:21,622 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias data
文件 hdfs:// namenode1:8020 / bla / test / malware.zip.seq 确实存在且权限设置为777只是为了排除权限错误。
由于我的猜测是它与pig脚本中的load命令有关,这里是load命令的调试消息:
2015-08-25 17:07:06,639 [main] DEBUG org.apache.pig.parser.QueryParserDriver - Original macro AST:
(QUERY (STATEMENT data (load 'hdfs://namenode1:8020/bla/test/malware.zip.seq' (FUNC com . endgame . binarypig . loaders . ExecutingTextLoader 'scripts/strings.sh' '180000' 'true'))))
2015-08-25 17:07:06,640 [main] DEBUG org.apache.pig.parser.QueryParserDriver - macro AST after import:
(QUERY (STATEMENT data (load 'hdfs://namenode1:8020/bla/test/malware.zip.seq' (FUNC com . endgame . binarypig . loaders . ExecutingTextLoader 'scripts/strings.sh' '180000' 'true'))))
2015-08-25 17:07:06,640 [main] DEBUG org.apache.pig.parser.QueryParserDriver - Resulting macro AST:
(QUERY (STATEMENT data (load 'hdfs://namenode1:8020/bla/test/malware.zip.seq' (FUNC com . endgame . binarypig . loaders . ExecutingTextLoader 'scripts/strings.sh' '180000' 'true'))))
2015-08-25 17:07:06,961 [main] DEBUG org.apache.pig.parser.QueryParserDriver - Original macro AST:
(QUERY (STATEMENT data (load 'hdfs://namenode1:8020/bla/test/malware.zip.seq' (FUNC com . endgame . binarypig . loaders . ExecutingTextLoader 'scripts/strings.sh' '180000' 'true'))))
2015-08-25 17:07:06,961 [main] DEBUG org.apache.pig.parser.QueryParserDriver - macro AST after import:
(QUERY (STATEMENT data (load 'hdfs://namenode1:8020/bla/test/malware.zip.seq' (FUNC com . endgame . binarypig . loaders . ExecutingTextLoader 'scripts/strings.sh' '180000' 'true'))))
2015-08-25 17:07:06,961 [main] DEBUG org.apache.pig.parser.QueryParserDriver - Resulting macro AST:
(QUERY (STATEMENT data (load 'hdfs://namenode1:8020/bla/test/malware.zip.seq' (FUNC com . endgame . binarypig . loaders . ExecutingTextLoader 'scripts/strings.sh' '180000' 'true'))))
有没有人知道如何解决这个问题,甚至如何调试它?
编辑(已添加pig_log):
Pig Stack Trace
---------------
ERROR 1066: Unable to open iterator for alias data
org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias data
at org.apache.pig.PigServer.openIterator(PigServer.java:892)
at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:774)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:372)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:198)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:173)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
at org.apache.pig.Main.run(Main.java:478)
at org.apache.pig.Main.main(Main.java:156)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.io.IOException: Job terminated with anomalous status FAILED
at org.apache.pig.PigServer.openIterator(PigServer.java:884)
... 13 more
================================================================================