运行单词计数MapReduce作业时出现运行时异常

时间:2019-01-29 20:03:41

标签: python hadoop mapreduce yarn hadoop-streaming

我刚接触大数据。

我在尝试运行Map Reduce作业时遇到问题。我正在尝试解决字数统计问题。

我正在使用Ubuntu。我的计算机上安装的python版本是“ Python 3.6.2”。

需要一些帮助来确定问题。

以下是我的mapper.py

import sys
import re

reload(sys)
sys.setdefaultencoding('utf-8') # required to convert to unicode

for line in sys.stdin:
    try:
        article_id, text = unicode(line.strip()).split('\t', 1)
    except ValueError as e:
        continue
    words = re.split("\W*\s+\W*", text, flags=re.UNICODE)
    for word in words:
        print >> sys.stderr, "reporter:counter:Wiki stats,Total words,%d" % 1
        print ("%s\t%d" % (word.lower(), 1))

以下是我的reducer.py

import sys

current_key = None
word_sum = 0

for line in sys.stdin:
    try:
        key, count = line.strip().split('\t', 1)
        count = int(count)
    except ValueError as e:
        continue
    if current_key != key:
        if current_key:
            print "%s\t%d" % (current_key, word_sum)
        word_sum = 0
        current_key = key
    word_sum += count

if current_key:
    print "%s\t%d" % (current_key, word_sum)

我运行以下命令来执行地图缩小作业

OUT_DIR="wordcount_result_"$(date +"%s%6N")
NUM_REDUCERS=8

hdfs dfs -rm -r -skipTrash ${OUT_DIR} > /dev/null

yarn jar /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-streaming.jar \
    -D mapred.jab.name="Streaming wordCount" \
    -D mapreduce.job.reduces=${NUM_REDUCERS} \
    -files mapper.py,reducer.py \
    -mapper "python mapper.py" \
    -combiner "python reducer.py" \
    -reducer "python reducer.py" \
    -input /data/wiki/en_articles_part \
    -output ${OUT_DIR} > /dev/null

我收到以下错误:

rm: `wordcount_result_1548790436309903': No such file or directory
19/01/29 19:33:59 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
19/01/29 19:33:59 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
19/01/29 19:34:00 INFO mapred.FileInputFormat: Total input files to process : 1
19/01/29 19:34:00 INFO mapreduce.JobSubmitter: number of splits:2
19/01/29 19:34:00 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1548784954892_0004
19/01/29 19:34:00 INFO impl.YarnClientImpl: Submitted application application_1548784954892_0004
19/01/29 19:34:00 INFO mapreduce.Job: The url to track the job: http://15e399519cda:8088/proxy/application_1548784954892_0004/
19/01/29 19:34:00 INFO mapreduce.Job: Running job: job_1548784954892_0004
19/01/29 19:34:06 INFO mapreduce.Job: Job job_1548784954892_0004 running in uber mode : false
19/01/29 19:34:06 INFO mapreduce.Job:  map 0% reduce 0%
19/01/29 19:34:12 INFO mapreduce.Job: Task Id : attempt_1548784954892_0004_m_000000_0, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
    at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:325)
    at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:538)
    at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
    at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:175)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1807)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:169)

19/01/29 19:34:12 INFO mapreduce.Job: Task Id : attempt_1548784954892_0004_m_000001_0, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
    at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:325)
    at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:538)
    at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
    at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:175)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1807)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:169)

19/01/29 19:34:17 INFO mapreduce.Job: Task Id : attempt_1548784954892_0004_m_000001_1, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
    at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:325)
    at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:538)
    at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
    at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:175)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1807)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:169)

19/01/29 19:34:17 INFO mapreduce.Job: Task Id : attempt_1548784954892_0004_m_000000_1, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
    at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:325)
    at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:538)
    at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
    at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:175)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1807)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:169)

19/01/29 19:34:22 INFO mapreduce.Job: Task Id : attempt_1548784954892_0004_m_000001_2, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
    at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:325)
    at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:538)
    at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
    at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:175)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1807)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:169)

19/01/29 19:34:23 INFO mapreduce.Job: Task Id : attempt_1548784954892_0004_m_000000_2, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
    at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:325)
    at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:538)
    at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
    at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:175)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1807)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:169)

19/01/29 19:34:27 INFO mapreduce.Job:  map 100% reduce 100%
19/01/29 19:34:28 INFO mapreduce.Job: Job job_1548784954892_0004 failed with state FAILED due to: Task failed task_1548784954892_0004_m_000001
Job failed as tasks failed. failedMaps:1 failedReduces:0

19/01/29 19:34:28 INFO mapreduce.Job: Counters: 17
    Job Counters 
        Failed map tasks=7
        Killed map tasks=1
        Killed reduce tasks=8
        Launched map tasks=8
        Other local map tasks=6
        Data-local map tasks=2
        Total time spent by all maps in occupied slots (ms)=24941
        Total time spent by all reduces in occupied slots (ms)=0
        Total time spent by all map tasks (ms)=24941
        Total time spent by all reduce tasks (ms)=0
        Total vcore-milliseconds taken by all map tasks=24941
        Total vcore-milliseconds taken by all reduce tasks=0
        Total megabyte-milliseconds taken by all map tasks=25539584
        Total megabyte-milliseconds taken by all reduce tasks=0
    Map-Reduce Framework
        CPU time spent (ms)=0
        Physical memory (bytes) snapshot=0
        Virtual memory (bytes) snapshot=0
19/01/29 19:34:28 ERROR streaming.StreamJob: Job not successful!
Streaming Command Failed!
cat: `wordcount_result_1548790436309903/part-00000': No such file or directory

0 个答案:

没有答案