我希望在运行Hadoop流媒体作业时包含第三方python库。
我按照帖子here中的建议,但似乎没有效果。
我提交了这样的命令:
hadoop jar /usr/local/hadoop/hadoop-2.2.0/lib/hadoop-streaming-2.2.0.jar \
-input $hdfs_input_file \
-output $hdfs_output_file \
-mapper $mapper_file \
-combiner $reducer_file \
-reducer $reducer_file \
-file $mapper_file \
-file $reducer_file \
-file $packaged_file
$ packaged_file是一个包含第三方库的打包文件。
我的脚本在此行失败(在$ mapper_file中):
xyz = importer.load_module('library_name')
错误消息是
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:320)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:533)
at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:429)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)
但是,上面的代码行在ipython中运行良好。我甚至可以在ipython中运行以下行
xyz.method_foo()
有关此问题的任何建议吗?谢谢!