我有此签名:
def aggregate_by_player_id(input, playerid, fields):
“字段”是指“输入”中按“ playerID”分组的字段。
我这样调用该函数:
aggregate_by_player_id(input, 'player', ['stat1','stat3'])
输入看起来像这样:
[{'player': '1', 'stat1': '3', 'stat2': '4', 'stat3': '5'},
{'player': '1', 'stat1': '1', 'stat2': '4', 'stat3': '1'},
{'player': '2', 'stat1': '1', 'stat2': '2', 'stat3': '3'},
{'player': '2', 'stat1': '1', 'stat2': '2', 'stat3': '1'},
{'player': '3', 'stat1': '4', 'stat2': '1', 'stat3': '6'}]
我的输出结构是:
nested_dic = {value_of_playerid1: {'playerid': value_of_playerid1, 'stat1': value_of_stat1, 'stat2': value_of_stat2},
value_of_playerid2: {'playerid': value_of_playerid2, 'stat2': value_of_stat2, 'stat2': value_of_stat2},
value_of_playerid3: {'playerid': value_of_playerid3, 'stat3': value_of_stat3, 'stat3': value_of_stat3}}
因此,输出应如下所示:
{'1': {'player': '1', 'stat1': 4, 'stat3': 6},
'2': {'player': '2', 'stat1': 2, 'stat3': 4},
'3': {'player': '3', 'stat1': 4, 'stat3': 6}}
答案 0 :(得分:2)
为此,我们可以使用itertools.groupby
对playerid
进行分组,然后对各个字段的值求和。
from itertools import groupby
from operator import itemgetter
def aggregate_by_player_id(input_, playerid, fields):
player = itemgetter(playerid)
output = {}
for k, v in groupby(input_, key=player):
data = list(v)
stats = {playerid: k}
for field in fields:
stats[field] = sum(int(d.get(field, 0)) for d in data)
output[k] = stats
return output
data.sort(key=player) # data must be pre-sorted on grouping key
results = aggregate_by_player_id(data, 'player', ['stat1', 'stat3'])
{'1': {'player': '1', 'stat1': 4, 'stat3': 6},
'2': {'player': '2', 'stat1': 2, 'stat3': 4},
'3': {'player': '3', 'stat1': 4, 'stat3': 6}}
答案 1 :(得分:1)
可以一次掌握您想要的结果,但可能不太可读。这是完成工作的简单函数:
data = [
{'player': '1', 'stat1': '3', 'stat2': '4', 'stat3': '5'},
{'player': '1', 'stat1': '1', 'stat2': '4', 'stat3': '1'},
{'player': '2', 'stat1': '1', 'stat2': '2', 'stat3': '3'},
{'player': '2', 'stat1': '1', 'stat2': '2', 'stat3': '1'},
{'player': '3', 'stat1': '4', 'stat2': '1', 'stat3': '6'}
]
def aggregate_dicts(ds, id_field, aggr_fields):
result = {}
for d in ds:
identifier = d[id_field]
if identifier not in result:
result[identifier] = {f: 0 for f in aggr_fields}
for f in aggr_fields:
result[identifier][f] += int(d[f])
return result
print(aggregate_dicts(data, 'player', ['stat1', 'stat3']))
结果:
{'1': {'stat1': 4, 'stat3': 6}, '2': {'stat1': 2, 'stat3': 4}, '3': {'stat1': 4, 'stat3': 6}}
如果要在字典中重复标识符,只需将此行添加到if
块中即可:
result[identifier][id_field] = identifier