每当我尝试运行这段代码来在python中执行map-reduce代码时,我都会得到一个 classnotfoundexception 。我目前正在使用hadoop-2.6.5。
输入:
hadoop jar /usr/local/hadoop1/share/hadoop/tools/sources/hadoop-streaming-2.6.5-test-sources.jar \
-input /wordcount/input/student_list.txt \
-output /wordcount/output/student_list_py.txt \
-mapper /home/hduser/wordcount_py/mapper.py \
-reducer /home/hduser/wordcount_py/reducer.py
输出:
Exception in thread "main" java.lang.ClassNotFoundException: -input
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.hadoop.util.RunJar.run(RunJar.java:214)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
我是Big Data和Hadoop的新手。请帮助。
答案 0 :(得分:0)
您使用错误的jar来运行Hadoop Streaming作业。流式传输jar位于Hadoop安装目录下的路径share/hadoop/tools/lib/hadoop-streaming-2.6.5.jar
中。另请注意,-output
必须是不存在的目录,而不是文件名。
尝试此命令,
hadoop jar /usr/local/hadoop1/share/hadoop/tools/lib/hadoop-streaming-2.6.5.jar \
-input /wordcount/input/student_list.txt \
-output /wordcount/output/ \
-mapper /home/hduser/wordcount_py/mapper.py \
-reducer /home/hduser/wordcount_py/reducer.py \
-file /home/hduser/wordcount_py/mapper.py \
-file /home/hduser/wordcount_py/reducer.py