我想在字典列表中计算某些统计信息,如下所示:
list1 = [{'hello': "world", 'score': 1.2}, {'hello': "world", 'score': 1.5}, {'hello': "world", 'score': 1.02},
{'hello': "world", 'score': 1.75}]
具体来说,我想找到与得分键相关联的值的最小值,最大值和标准化值(意味着我必须更新现有词典)。
我已经实现了明显的方法,如下所示。但是,我想知道是否有更好的方法来实现这一目标?
list1 = [{'hello': "world", 'score': 1.2}, {'hello': "world", 'score': 1.5}, {'hello': "world", 'score': 1.02},
{'hello': "world", 'score': 1.75}]
def min_value(rank_norm):
list_values = []
for x in rank_norm:
list_values.append(x['score'])
return min(list_values)
def max_value(rank_norm):
list_values = []
for x in rank_norm:
list_values.append(x['score'])
return max(list_values)
def normalize_dict(rank_norm, min_val, max_val):
for x in rank_norm:
x['score'] = (x['score']-min_val)/(max_val - min_val)
return rank_norm
min_val_list = min_value(list1)
max_val_list = max_value(list1)
print(min_val_list)
print(max_val_list)
print("Original dict: ", list1)
print("Normalized dict: ", normalize_dict(list1, min_val_list, max_val_list))
我使用的是Python 3。
答案 0 :(得分:3)
您可以像这样更新原始字典:
list1 = [{'hello': "world", 'score': 1.2}, {'hello': "world", 'score': 1.5}, {'hello': "world", 'score': 1.02},
{'hello': "world", 'score': 1.75}]
values = [i["score"] for i in list1]
minimum = min(values)
maximum = max(values)
normalized_dict = [{a:b if a == "hello" else (b-minimum)/float(maximum-minimum) for a, b in i.items()} for i in list1]
输出:
[{'score': 0.24657534246575336, 'hello': 'world'}, {'score': 0.6575342465753424, 'hello': 'world'}, {'score': 0.0, 'hello': 'world'}, {'score': 1.0, 'hello': 'world'}]
答案 1 :(得分:2)
是的,您可以使用生成器或列表推导来获得最小值和最大值:
from operator import itemgetter
def min_value(rank_norm):
return min(map(itemgetter('score'),rank_norm))
def max_value(rank_norm):
return max(map(itemgetter('score'),rank_norm))
您的字典规范化代码很好。但是,您可以使用 list comprehension 来构建带有词典的新列表。如果您不需要更新值,那么构建新列表往往更安全,因为您的代码的某些部分可能会引用旧列表或旧字典,而您本身并不想要改变这些:
def normalize_dict(rank_norm, min_val, max_val):
delta = max_val-min_val
return [dict(d,score=(d['score']-min_val)/delta) for d in rank_norm]
如果项目数量很大,您可以使用pandas数据帧提高性能:
import pandas as pd
df = pd.DataFrame(list1)
sc = df['score']
sc_mi = sc.min()
df['score'] = (sc-sc_mi)/(sc.max()-sc_mi)
然后数据框是:
>>> df
hello score
0 world 0.246575
1 world 0.657534
2 world 0.000000
3 world 1.000000
您可以继续处理数据框,或者如果您想要字典列表,可以使用:
>>> list(df.T.to_dict().values())
[{'hello': 'world', 'score': 0.24657534246575336}, {'hello': 'world', 'score': 0.6575342465753424}, {'hello': 'world', 'score': 0.0}, {'hello': 'world', 'score': 1.0}]
答案 2 :(得分:2)
您可以将最小/最大计算合并为一个,而不是将分数列表构建两次并多次遍历列表
from operator import itemgetter
min_val, max_val = itemgetter(0, -1)(sorted([x['score'] for x in list1]))
答案 3 :(得分:2)
以下是max和min函数的更多pythonic方法:
def min_value(rank_norm):
return min([x['score'] for x in rank_norm])
def max_value(rank_norm):
return max([x['score'] for x in rank_norm])
不是那么快,而是更简单。此外,这里是使用单行表达式的normalize函数,这看起来不太好,但有效:
def normalize_dict(rank_norm, min_val, max_val):
return [{'hello':x['hello'] , 'score':(x['score']-min_val)/(max_val - min_val)} for x in rank_norm]
答案 4 :(得分:2)
import pandas as pd
your_list = [{'hello': "world", 'score': 1.2}, {'hello': "world", 'score': 1.5}, {'hello': "world", 'score': 1.02},
{'hello': "world", 'score': 1.75}]
#Reading in to a pandas dataframe
d = pd.DataFrame.from_dict(your_list)
your_list
已映射到dataframe
print(d)
hello score
0 world 1.20
1 world 1.50
2 world 1.02
3 world 1.75
计算统计信息并更新score
列
d['score'] = (d['score'] - min(d['score']))/(max(d['score'] - min(d['score'])))
d
现在的样子,
print(d)
hello score
0 world 0.246575
1 world 0.657534
2 world 0.000000
3 world 1.000000
将更新的数据框d
写入字典
updated = pd.DataFrame.to_dict(d, orient = 'records')
print(updated)
[{'score': 0.24657534246575336, 'hello': 'world'}, {'score': 0.6575342465753424, 'hello': 'world'}, {'score': 0.0, 'hello': 'world'}, {'score': 1.0, 'hello': 'world'}]
答案 5 :(得分:1)
另一种使用operator.itemgetter的方法:根据分数对列表进行排序,提取最小和最大分数,处理..
import operator
a = [{'hello': "world3", 'score': 1.2}, .... ]
score = operator.itemgetter('score')
a.sort(key = score)
minimum = score(a[0])
maximum = score(a[-1])
span = maximum - minimum
for d in a:
d['score'] = (d['score'] - minimum) / span