无法在Hadoop中使用python运行map reduce?

时间:2014-07-09 18:53:36

标签: python hadoop mapreduce hadoop2

我在python中编写了mapper和reducer for word count program,工作正常。 这是一个示例:

echo "hello hello world here hello here world here hello" | wordmapper.py | sort -k1,1 | wordreducer.py 
hello   4
here    3
world   2

现在,当我尝试为大文件提交hadoop作业时,我会收到错误

hadoop jar share/hadoop/tools/sources/hadoop-*streaming*.jar -file wordmapper.py -mapper wordmapper.py  -file wordreducer.py -reducer wordreducer.py -input /data/1jrl.pdb -output /output/py_jrl
Exception in thread "main" java.lang.ClassNotFoundException: share.hadoop.tools.sources.hadoop-streaming-2.2.0-test-sources.jar
    at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:249)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:205)

我删除了将命令行更改为以下内容(从上面删除了外卡);

hadoop jar share/hadoop/tools/sources/hadoop-streaming-2.2.0-sources.jar -file wordmapper.py -mapper wordmapper.py  -file wordreducer.py -reducer wordreducer.py -input /data/1jrl.pdb -output /output/py_jrl
Exception in thread "main" java.lang.ClassNotFoundException: -file
    at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:249)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:205)

为什么我会收到这些错误以及如何解决此问题? 我用hadoop2.谢谢!

1 个答案:

答案 0 :(得分:3)

至少有一个问题是您使用的-sources.jar只是.java个文件而且无法执行。

尝试使用此代替......

share/hadoop/tools/lib/hadoop-streaming-2.2.0.jar

如果不存在,请查找文件名中没有hadoop-streaming*.jar的{​​{1}}。