我有一个输入文件
Chicago 500
NewWork 200
California 100
我需要将第二列的差异作为每个城市的输出
Chicago Newyork 300
Chicago California 100
Newyork Chicago -300
Newyork California 100
California Chicago -400
California Newyork -100
我尝试了很多但却无法找出在map reduce中实现的准确和正确的方法。请给我一些解决方案
答案 0 :(得分:1)
这是一个伪代码。我经常使用Python,所以看起来更像它。为此,您必须知道行的总行数(即此处的城市),并在运行作业之前将其用于N.
map(dummy, line):
city, pop = line.split()
for idx in 1:N
emit(idx, (city, pop))
reduce(idx, city_data):
city_data.sort() # sort by city to ensure indices are consistent
city, pop = city_data[idx]
for i in 1:N
if idx != i:
c, p = city_data[i]
dist = pop - p
emit(city, (c, dist))