我想知道如何在 Hadoop 2.9.0 多节点群集上执行map reduce代码?我想了解哪个节点进程输入。实际上,如何查找输入数据的每个部分都由哪个映射器处理?我在master上执行了以下python代码:
import sys
import socket
for line in sys.stdin:
line = line.strip()
words = line.split()
for word in words:
print('%s\t%s\t%s' % (word, 1, socket.gethostname()))
我使用socket.gethostname()
来查找节点的主机名。我预测这个映射器的输出是(例如):
Bye 1 hadoopmaster
Goodbye 1 hadoopmaster
Hadoop 1 hadoopmaster
Hadoop 1 hadoopslave1
Hello 1 hadoopmaster
Hello 1 hadoopslave2
但是:
Bye 1 hadoopmaster
Goodbye 1 hadoopmaster
Hadoop 1 hadoopmaster
Hadoop 1 hadoopmaster
Hello 1 hadoopmaster
Hello 1 hadoopmaster
代码是否未在从属节点上运行?