我有一个字典列表,每个字典包含完全相同的键。我想找到每个键的平均值,我想知道如何使用reduce(或者如果不可能使用另一种更优雅的方式而不是使用嵌套的for
)。
以下是清单:
[
{
"accuracy": 0.78,
"f_measure": 0.8169374016795885,
"precision": 0.8192088044235794,
"recall": 0.8172222222222223
},
{
"accuracy": 0.77,
"f_measure": 0.8159133315763016,
"precision": 0.8174754717495807,
"recall": 0.8161111111111111
},
{
"accuracy": 0.82,
"f_measure": 0.8226353934130455,
"precision": 0.8238175920455686,
"recall": 0.8227777777777778
}, ...
]
我想回到这样的词典:
{
"accuracy": 0.81,
"f_measure": 0.83,
"precision": 0.84,
"recall": 0.83
}
这是我到目前为止所做的,但我不喜欢它:
folds = [ ... ]
keys = folds[0].keys()
results = dict.fromkeys(keys, 0)
for fold in folds:
for k in keys:
results[k] += fold[k] / len(folds)
print(results)
答案 0 :(得分:7)
作为替代方案,如果你要对数据进行这样的计算,那么你可能希望使用pandas(这将是一次性过度杀伤,但会大大简化这些任务...... )
import pandas as pd
data = [
{
"accuracy": 0.78,
"f_measure": 0.8169374016795885,
"precision": 0.8192088044235794,
"recall": 0.8172222222222223
},
{
"accuracy": 0.77,
"f_measure": 0.8159133315763016,
"precision": 0.8174754717495807,
"recall": 0.8161111111111111
},
{
"accuracy": 0.82,
"f_measure": 0.8226353934130455,
"precision": 0.8238175920455686,
"recall": 0.8227777777777778
}, # ...
]
result = pd.DataFrame.from_records(data).mean().to_dict()
这给了你:
{'accuracy': 0.79000000000000004,
'f_measure': 0.8184953755563118,
'precision': 0.82016728940624295,
'recall': 0.81870370370370382}
答案 1 :(得分:4)
在这里,使用reduce()
:
from functools import reduce # Python 3 compatibility
summed = reduce(
lambda a, b: {k: a[k] + b[k] for k in a},
list_of_dicts,
dict.fromkeys(list_of_dicts[0], 0.0))
result = {k: v / len(list_of_dicts) for k, v in summed.items()}
这将从第一个字典的键产生0.0
值的起始点,然后将所有值(每个键)加总到最终字典中。然后将总和除以产生平均值。
演示:
>>> from functools import reduce
>>> list_of_dicts = [
... {
... "accuracy": 0.78,
... "f_measure": 0.8169374016795885,
... "precision": 0.8192088044235794,
... "recall": 0.8172222222222223
... },
... {
... "accuracy": 0.77,
... "f_measure": 0.8159133315763016,
... "precision": 0.8174754717495807,
... "recall": 0.8161111111111111
... },
... {
... "accuracy": 0.82,
... "f_measure": 0.8226353934130455,
... "precision": 0.8238175920455686,
... "recall": 0.8227777777777778
... }, # ...
... ]
>>> summed = reduce(
... lambda a, b: {k: a[k] + b[k] for k in a},
... list_of_dicts,
... dict.fromkeys(list_of_dicts[0], 0.0))
>>> summed
{'recall': 2.4561111111111114, 'precision': 2.4605018682187287, 'f_measure': 2.4554861266689354, 'accuracy': 2.37}
>>> {k: v / len(list_of_dicts) for k, v in summed.items()}
{'recall': 0.8187037037037038, 'precision': 0.820167289406243, 'f_measure': 0.8184953755563118, 'accuracy': 0.79}
>>> from pprint import pprint
>>> pprint(_)
{'accuracy': 0.79,
'f_measure': 0.8184953755563118,
'precision': 0.820167289406243,
'recall': 0.8187037037037038}
答案 2 :(得分:2)
您可以使用Counter
优雅地进行求和:
from itertools import Counter
summed = sum((Counter(d) for d in folds), Counter())
averaged = {k: v/len(folds) for k, v in summed.items()}
如果你真的喜欢它,它甚至可以变成一个oneliner:
averaged = {
k: v/len(folds)
for k, v in sum((Counter(d) for d in folds), Counter()).items()
}
无论如何,我认为比复杂的reduce()
更具可读性; sum()
本身就是一个适当的专业版本。
一个更简单的oneliner,不需要任何导入:
averaged = {
k: sum(d[k] for d in folds)/len(folds)
for k in folds[0]
}
有趣的是,它的速度要快得多(甚至比pandas
?!),而且统计数据也更容易改变。
我尝试用Python 3.5中的statistics.mean()
函数替换手动计算,但这会使它慢10倍。
答案 3 :(得分:0)
这是使用列表理解的可怕的一个班轮。你可能最好不要使用它。
final = dict(zip(lst[0].keys(), [n/len(lst) for n in [sum(i) for i in zip(*[tuple(x1.values()) for x1 in lst])]]))
for key, value in final.items():
print (key, value)
#Output
recall 0.818703703704
precision 0.820167289406
f_measure 0.818495375556
accuracy 0.79
答案 4 :(得分:-1)
这是另一种方式,一步一步:
from functools import reduce
d = [
{
"accuracy": 0.78,
"f_measure": 0.8169374016795885,
"precision": 0.8192088044235794,
"recall": 0.8172222222222223
},
{
"accuracy": 0.77,
"f_measure": 0.8159133315763016,
"precision": 0.8174754717495807,
"recall": 0.8161111111111111
},
{
"accuracy": 0.82,
"f_measure": 0.8226353934130455,
"precision": 0.8238175920455686,
"recall": 0.8227777777777778
}
]
key_arrays = {}
for item in d:
for k, v in item.items():
key_arrays.setdefault(k, []).append(v)
ave = {k: reduce(lambda x, y: x+y, v) / len(v) for k, v in key_arrays.items()}
print(ave)
# {'accuracy': 0.79, 'recall': 0.8187037037037038,
# 'f_measure': 0.8184953755563118, 'precision': 0.820167289406243}