我写了一个关于字数概念map.py
和reduce.py
程序的程序。我已经成功运行了分别执行的程序。但最后一步执行不成功。我得到了错误(意外的事情)。我怎么能解决这个问题。我正在上传我的map.py
,reduce.py
程序和错误声明。
map.py
:
import sys
for line in sys.stdin:
line = line.strip()
words = line.split()
for word in words:
print '%s\t%s' % (word, "1")
reduce.py
:
import sys
c_count = {}
for line in sys.stdin:
line = line.strip()
word, count = line.split('\t', 1)
try:
count = int(count)
except ValueError:
continue
try:
c_count[word] = c_count[word]+count
except:
c_count[word] = count
for word in c_count.keys():
print '%s\t%s'% ( word, c_count[word] )
日志如下:
18/02/14 09:47:34 INFO streaming.PipeMapRed: R/W/S=1/0/0 in:NA [rec/s] out:NA [rec/s]
18/02/14 09:47:35 INFO mapred.LocalJobRunner: hdfs://localhost:54310/s_data/wc_input/name.txt:0+115 > map
18/02/14 09:47:36 INFO mapreduce.Job: map 67% reduce 0%
18/02/14 09:47:38 INFO mapred.LocalJobRunner: hdfs://localhost:54310/s_data/wc_input/name.txt:0+115 > map
/home/babu/./map1.py: 4: /home/babu/./map1.py: Syntax error: word unexpected (expecting "do")
18/02/14 09:51:56 INFO streaming.PipeMapRed: MRErrorThread done
18/02/14 09:51:56 INFO streaming.PipeMapRed: PipeMapRed failed!
java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 2
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:322)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:535)
at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
18/02/14 09:51:57 INFO mapred.LocalJobRunner: map task executor complete.
18/02/14 09:51:57 WARN mapred.LocalJobRunner: job_local771131044_0001
java.lang.Exception: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 2
at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
Caused by: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 2
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:322)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:535)
at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
18/02/14 09:51:58 INFO mapreduce.Job: Job job_local771131044_0001 failed with state FAILED due to: NA
18/02/14 09:51:58 INFO mapreduce.Job: Counters: 22
File System Counters
FILE: Number of bytes read=1124
FILE: Number of bytes written=287165
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=115
HDFS: Number of bytes written=0
HDFS: Number of read operations=5
HDFS: Number of large read operations=0
HDFS: Number of write operations=1
Map-Reduce Framework
Map input records=1
Map output records=0
Map output bytes=0
Map output materialized bytes=0
Input split bytes=99
Combine input records=0
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=0
Total committed heap usage (bytes)=231735296
File Input Format Counters
Bytes Read=115
18/02/14 09:51:58 ERROR streaming.StreamJob: Job not successful!
Streaming Command Failed!
答案 0 :(得分:2)
您可以尝试在map.py和reduce.py的第一行添加#!/ usr / bin / python 。