Hadoop mapreduce任务失败了143

时间:2016-10-31 17:47:09

标签: python hadoop mapreduce

我目前正在学习使用Hadoop mapred并遇到过这个错误:

packageJobJar: [/home/hduser/mapper.py, /home/hduser/reducer.py, /tmp/hadoop-unjar4635332780289131423/] [] /tmp/streamjob8641038855230304864.jar tmpDir=null
16/10/31 17:41:12 INFO client.RMProxy: Connecting to ResourceManager at /192.168.0.55:8050
16/10/31 17:41:13 INFO client.RMProxy: Connecting to ResourceManager at /192.168.0.55:8050
16/10/31 17:41:15 INFO mapred.FileInputFormat: Total input paths to process : 1
16/10/31 17:41:17 INFO mapreduce.JobSubmitter: number of splits:2
16/10/31 17:41:18 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1477933345919_0004
16/10/31 17:41:19 INFO impl.YarnClientImpl: Submitted application application_1477933345919_0004
16/10/31 17:41:19 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1477933345919_0004/
16/10/31 17:41:19 INFO mapreduce.Job: Running job: job_1477933345919_0004
16/10/31 17:41:38 INFO mapreduce.Job: Job job_1477933345919_0004 running in uber mode : false
16/10/31 17:41:38 INFO mapreduce.Job:  map 0% reduce 0%
16/10/31 17:41:56 INFO mapreduce.Job:  map 100% reduce 0%
16/10/31 17:42:19 INFO mapreduce.Job: Task Id : attempt_1477933345919_0004_r_000000_0, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
    at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:322)
    at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:535)
    at org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:134)
    at org.apache.hadoop.io.IOUtils.cleanup(IOUtils.java:244)
    at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:459)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143

我无法弄清楚如何修复此错误并且一直在网上搜索。我用于映射器的代码是:

导入sys

for line in sys.stdin:
    line = line.strip()
    words = line.split()

    for word in words:
        print '%s\t%s' % (word, 1)

reducer的代码是:

from operator import itemgetter
import sys

current_word = None
current_count = 0
word = None

for line in sys.stdin:
    line = line.strip()
    word, count = line.split('\t', 1)

    try:
        count = int(count)
    except ValueError:
        continue

    if current_word == word:
        current_count += count
    else:
        if current_word:
            print '%s\t%s' % (current_word, current_count)
        current_count = count
        current_word = word

if current_word == word:
    print '%s\t%s' % (current_word, current_count)

为了运行我正在使用的任务:

hduser@master:/opt/hadoop-2.7.3/share/hadoop/tools/lib $ hadoop jar hadoop-streaming-2.7.3.jar -file /home/hduser/mapper.py -mapper "python mapper.py" -file /home/hduser/reducer.py -reducer "python reducer.py" -input ~/testDocument -output ~/results1

任何帮助都会受到赞赏,因为我是Hadoop的新手。如果需要更多日志或信息,请不要犹豫。

1 个答案:

答案 0 :(得分:0)

查看日志中的python代码中的错误。对于EMR / yarn,您可以从WEB UI或群集主shell上找到您的日志,如下所示(您的应用程序ID将因作业启动时打印而不同)。有很多输出,在我显示时将其重定向到文件中并查找python堆栈跟踪。

$ yarn logs -applicationId application_1503951120983_0031 > /tmp/log