我有以下数据
Name Year score
A 1996 84
A 1997 65
A 1996 76
A 1998 78
A 1998 65
B 1998 53
B 1996 98
B 1996 83
B 1996 54
我想要输出如下
Name Year max_score
A 1996 84
B 1996 98
如何为此作业编写python map reduce代码?
我可以创建NAME和YEAR作为单个键,得分值可以使用。
但还有其他方法可以解决这个问题。
答案 0 :(得分:2)
假设您的所有年份和分数均为正数:
from collections import defaultdict
mapping = defaultdict( lambda: (0,0) )
with open(datafile) as f:
for line in f:
name,year,score = line.split()
try:
year = int(year)
score = int(score)
except ValueError:
continue
if score > mapping[name][1]:
mapping[name] = year,score
或稍微简洁一点,但对错误不太健壮:
from collections import defaultdict
mapping = defaultdict( lambda: (0,0) )
with open(datafile) as f:
f.readline() #header. Don't need it.
for line in f:
name,year,score = line.split()
if int(score) > mapping[name][1]:
mapping[name] = int(year),int(score)
答案 1 :(得分:0)
这就是你要追求的吗?
def mapper(key, value):
name, year, score = value.split()
yield name, (year, score)
def reducer(name, values):
yield name, max(values, key=operator.itemgetter(1))