错误:java.lang.RuntimeException:PipeMapRed.waitOutputThreads():子进程失败,代码为1,在本地完美运行

时间:2018-08-29 11:39:20

标签: python hadoop-streaming

我已经在每个论坛上搜索了此错误,但是没有运气。我收到以下错误:

18/08/29 00:24:53 INFO mapreduce.Job:  map 0% reduce 0%
18/08/29 00:24:59 INFO mapreduce.Job: Task Id : attempt_1535105716146_0226_m_000000_0, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
        at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:325)
        at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:538)
        at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
        at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:465)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:349)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1688)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)


18/08/29 00:25:45 INFO mapreduce.Job: Task Id : attempt_1535105716146_0226_r_000000_2, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
        at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:325)
        at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:538)
        at org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:134)
        at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:454)
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:393)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1688)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)

18/08/29 00:25:52 INFO mapreduce.Job:  map 100% reduce 100%
18/08/29 00:25:53 INFO mapreduce.Job: Job job_1535105716146_0226 failed with state FAILED due to: Task failed task_1535105716146_0226_r_000000
Job failed as tasks failed. failedMaps:0 failedReduces:1 killedMaps:0 killedReduces: 0


18/08/29 00:25:53 ERROR streaming.StreamJob: Job not successful!
Streaming Command Failed!

我还借助python独立命令尝试了map-reduce代码

cat student1.txt | python mapper.py | python reducer.py

代码工作得很好。但是,当我通过Hadoop Streaming尝试使用它时,它反复抛出上述错误。我的输入文件大小为3KB。在更改python版本后,我也尝试了运行Hadoop-streaming命令,但是没有运气!我还在脚本顶部添加了#!/usr/bin/python命令。目录内没有任何内容。我还尝试了不同版本的命令:

版本1:

hadoop jar /usr/hdp/3.0.0.0-1634/hadoop-mapreduce/hadoop-streaming-3.1.0.3.0.0.0-1634.jar -Dmapred.reduce.tasks=1 -file /home/mapper.py -mapper mapper.py -file /home/reducer.py -reducer reducer.py -input /data/studentMapReduce/student1.txt -output outputMapReduceFile.txt

版本2:带有单引号和双引号的python命令

hadoop jar /usr/hdp/3.0.0.0-1634/hadoop-mapreduce/hadoop-streaming-3.1.0.3.0.0.0-1634.jar -Dmapred.reduce.tasks=1 -file /home/mapper.py -mapper "python mapper.py" -file /home/reducer.py -reducer "python reducer.py" -input /data/studentMapReduce/student1.txt -output outputMapReduceFile.txt

简单的单词计数程序在环境上成功运行,也生成正确的输出,但是当我在python脚本中添加mysql.connector服务时,Hadoop流式传输会报告此错误。我还研究了作业日志,但未找到此类信息。

2 个答案:

答案 0 :(得分:0)

我检查了作业错误日志,并将所需的不是预定义库的python文件放到python目录中。然后,输入带有这些python文件的Hadoop流命令:

hadoop jar /usr/hdp/3.0.0.0-1634/hadoop-mapreduce/hadoop-streaming-3.1.0.3.0.0.0-1634.jar -Dmapred.reduce.tasks=0 -file /home/mapper3.py -mapper mapper3.py -file /home/reducer3.py -reducer reducer3.py -file /home/ErrorHandle.py -file /home/ExceptionUtil.py -input /data/studentMapReduce/student1.txt -output outputMapReduceFile.txt

答案 1 :(得分:0)

如果您的问题与python库或代码问题无关,则可能与python文件注释(第一行)和您的操作系统有关。

对于我来说,在MAC OS上,使用本教程在本地安装HADOOP之后:tuto Python映射器/缩减器执行得不好。 错误: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1 要么 java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 127

我的配置:

  • 我使用HADOOP 3.2.1_1
  • 使用Python 3.7.6和
  • 在macOS Mojave 10.14.6上
  • 我已经安装了JAVA版本的教程(adoptopenjdk8):“ 1.8.0_252”

要使用python启动您的工作,我使用新命令:mapred streaming而非Hadoop documentation形式的hadoop jar /xxx/hadoop-mapreduce/hadoop-streaming-xxx.jar (请注意,我认为该文档不适用于带有通用选项的示例(不推荐使用:-file,新建:-files)

我发现了两种可能性:

  1. 保持python文件第一行不变:# -*-coding:utf-8 -*

仅此命令对我有用:

mapred streaming -files WordCountMapper.py,WordCountReducer.py \
-input /data/input/README.TXT \
-output /data/output \
-mapper "python WordCountMapper.py" \
-reducer "python WordCountReducer.py"

假设我要使用本地Python文件/data/input/README.TXThadoop fs -copyFromLocal /absolute-local-folder/data/input/README.TXT /data/input

计算已经复制到HDFS卷(WordCountMapper.py)中的WordCountReducer.py的单词

WordCountMapper.py的代码:

#!/usr/bin/python
# -*-coding:utf-8 -*
import sys

for line in sys.stdin:
    # Supprimer les espaces
    line = line.strip()
    # recupérer les mots
    words = line.split()

    # operation map, pour chaque mot, generer la paire (mot, 1)
    for word in words:
        print("%s\t%d" % (word, 1))

WordCountReducer.py的代码:

#!/usr/bin/python
# -*-coding:utf-8 -*

import sys
total = 0
lastword = None

for line in sys.stdin:
    line = line.strip()

    # recuperer la cle et la valeur et conversion de la valeur en int
    word, count = line.split()
    count = int(count)

    # passage au mot suivant (plusieurs cles possibles pour une même exécution de programme)
    if lastword is None:
        lastword = word
    if word == lastword:
        total += count
    else:
        print("%s\t%d occurences" % (lastword, total))
        total = count
        lastword = word

if lastword is not None:
    print("%s\t%d occurences" % (lastword, total))
  1. 编辑要执行的python文件:

2.1。将执行模式添加到python文件:

chmod +x WordCountMapper.py

chmod +x WordCountReducer.py

2.2。首先添加两行:

first line :  `#!/usr/bin/python` 

second line : `# -*-coding:utf-8 -*`

使用此命令:

mapred streaming -files WordCountMapper.py,WordCountReducer.py \
-input /data/input/README.TXT \
-output /data/output \
-mapper ./WordCountMapper.py \
-reducer ./WordCountReducer.py