我正在编写一个简单的MR程序来查找文件中包含单词“Private”的行数。地图阶段运行良好,但减少阶段连续失败。我在这里粘贴代码.... 映射器:
#!/usr/bin/env python
import sys
# input comes from STDIN (standard input)
# the mapper will get number of records containing word "Private"
for line in sys.stdin:
# remove leading and trailing whitespace
line = line.strip()
# split the line into words
words = line.split()
count = 0
string = "Private"
string = string.strip()
count = count + 1
# increase counters
for word in words:
if word == string:
# write the results to STDOUT (standard output);
# what we output here will be go through the shuffle proess and then
# be the input for the Reduce step, i.e. the input for reducer.py
print '%s\t%s' % (string ,count)
减速机:
#!/usr/bin/env python
from operator import itemgetter
import sys
current_sum = 0
# input comes from STDIN
for line in sys.stdin:
# remove leading and trailing whitespace
line = line.strip()
# parse the input we got from mapper.py
string, count = line.split('\t', 1)
try:
count = float(count)
except ValueError:
continue
current_sum = current_sum + count
print '%s\t%s' % (string, current_sum)
当作业失败时,我收到以下消息
15/09/04 10:45:02 INFO client.RMProxy:在/0.0.0.0:8032连接到ResourceManager 15/09/04 10:45:02 INFO client.RMProxy:在/0.0.0.0:8032连接到ResourceManager 15/09/04 10:45:03 INFO mapred.FileInputFormat:要处理的总输入路径:1 15/09/04 10:45:03 INFO mapreduce.JobSubmitter:分裂数:2 15/09/04 10:45:03 INFO mapreduce.JobSubmitter:提交工作代币:job_1441341950773_0003 15/09/04 10:45:03 INFO impl.YarnClientImpl:提交的应用程序application_1441341950773_0003 15/09/04 10:45:03 INFO mapreduce.Job:跟踪工作的网址:http://meenal-Vostro-3546:8088/proxy/application_1441341950773_0003/ 15/09/04 10:45:03 INFO mapreduce.Job:正在运行的职位:job_1441341950773_0003 15/09/04 10:45:09 INFO mapreduce.Job:在uber模式下运行的job job_1441341950773_0003:false 15/09/04 10:45:09 INFO mapreduce.Job:地图0%减少0% 15/09/04 10:45:16 INFO mapreduce.Job:任务ID:attempt_1441341950773_0003_m_000001_0,状态:未通过 错误:java.lang.RuntimeException:PipeMapRed.waitOutputThreads():子进程失败,代码为1 在org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:322) 在org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:535) 在org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130) 在org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61) 在org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34) 在org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) 在org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) 在org.apache.hadoop.mapred.YarnChild $ 2.run(YarnChild.java:163) at java.security.AccessController.doPrivileged(Native Method) 在javax.security.auth.Subject.doAs(Subject.java:422) 在org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) 在org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)