Hadoop MapReduce Wordcount python执行错误

时间:2017-10-31 18:54:27

标签: python hadoop mapreduce cloudera word-count

我正在尝试执行python MapReduce wordcount程序

我是从writing a Hadoop MapReduce program in python拿走的 只是为了试着理解它是如何工作的,但问题始终是Job没有成功!

我在mapper.py

中使用此库执行reducer.pyCloudera VM
/usr/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming-2.6.0-mr1-cdh5.12.0.jar

执行命令:

hadoop jar /usr/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming-2.6.0-mr1-cdh5.12.0.jar
-Dmaperd.reduce, tasks=1
-file wordcount/mapper.py 
-mapper mapper.py -file wordcount/reducer.py
-reducer reducer.py
-input myinput/test.txt
-output output

enter image description here

1 个答案:

答案 0 :(得分:2)

问题出在文件路径上mapper.py和reducer.py必须来自本地

但输入文件必须来自hdfs路径

首先,必须使用

在本地测试python代码
cat <input file> | python <path from>/mapper.py | python <path from local>/reducer.py

然后在hdfs上

hadoop jar /usr/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming-2.6.0-mr1-cdh5.12.0.jar 

-Dmaperd.reduce,tasks=1 -file <path of local>/mapper.py 

-mapper "python <path from local>/mapper.py" 

-file <path from local>/reducer.py -

reducer "python <path of local>/reducer.py" 

-input <path from hdfs>/myinput/test.txt 

-output output