使用sys.stdin的Python程序会出错 - Hadoop Streaming

时间:2015-01-25 22:17:37

标签: python hadoop mapreduce stdin hadoop-streaming

我正在尝试学习hadoop流媒体。我刚刚编写了一个三行python程序来检查一切是否正常工作但是卡住了。

代码:

#!/usr/bin/env python

import sys

for line in sys.stdin:
    print "Inside Loop"

我以前执行的命令:

hadoop jar /usr/local/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.5.1.jar  -file './test.py'  -mapper './test.py' -input ./sample.txt -output ./outfile

我得到的错误:

Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 127
        at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:320)
        at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:533)
        at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
        at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)

15/01/25 17:09:55 INFO mapreduce.Job: Task Id : attempt_1418762215449_0069_m_000001_0, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 127
        at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:320)
        at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:533)
        at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
        at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)

我只想检查一下我是否可以读取存储在hdfs中的文件。我只是用mapper来检查这个。

有人可以告诉我这里的错误吗?

2 个答案:

答案 0 :(得分:2)

这是一个安全问题:请注意有关UserGroupInformation的消息

  at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)

对于初学者来说,它不太可能是一个简单的解决方案(我有几年的hadoop并且不特别喜欢对权限/ ACL问题进行故障排除..)。我建议您与过去设置群集的人一起工作。

答案 1 :(得分:1)

当我收到此错误时,这是​​由于CRLF行结尾而不是LF。

你可以检查一下。

我找到了答案here