Question

我是hadoop和mapreduce的新手，我正在尝试编写一个mapreduce来计算字数txt文件的前10个字数。

我的txt文件＆＃39; q2_result.txt＆＃39;看起来像：

yourself        268
yourselves      73
yoursnot        1
youst   1
youth   270
youthat 1
youthful        31
youths  9
youtli  1
youwell 1
youwondrous     1
youyou  1
zanies  1
zany    1
zeal    32
zealous 6
zeals   1

映射器：

#!/usr/bin/env python

import sys

for line in sys.stdin:
    line = line.strip()
    word, count = line.split()
    print "%s\t%s" % (word, count)

减速机：

#!usr/bin/env/ python

import sys

top_n = 0
for line in sys.stdin:
    line = line.strip()
    word, count = line.split()

    top_n += 1
    if top_n == 11:
        break
    print '%s\t%s' % (word, count)

我知道你可以在Hadoop jar命令中将标志传递给-D选项，以便对你想要的密钥进行排序（在我的例子中是k2,2的计数），这里我只是使用一个简单的命令第一：

hadoop jar /usr/hdp/2.5.0.0-1245/hadoop-mapreduce/hadoop-streaming-2.7.3.2.5.0.0-1245.jar -file /root/LAB3/mapper.py -mapper mapper.py -file /root/LAB3/reducer.py -reducer reducer.py -input /user/root/lab3/q2_result.txt -output /user/root/lab3/test_out

所以我认为这样简单的映射器和缩减器不应该给我错误，但确实如此，我无法弄清楚原因，错误在这里：http://pastebin.com/PvY4d89c

（我使用Horton在Ubuntu16.04上的virtualBox上运行HDP Sandbox）

Answer 1

我知道，＆＃34;文件未找到错误＆＃34;意味着完全不同于＆＃34;文件无法执行＆＃34;，在这种情况下问题是文件无法执行。

在Reducer.py中：

错：

#!usr/bin/env/ python

正确：

#!/usr/bin/env python

Hadoop Streaming简单的作业失败错误python

1 个答案: