计算词典列表中的最小值和最大值,以便对字典值进行规范化

时间:2017-10-12 13:59:02

标签: python python-3.x

我想在字典列表中计算某些统计信息,如下所示:

list1 = [{'hello': "world", 'score': 1.2}, {'hello': "world", 'score': 1.5}, {'hello': "world", 'score': 1.02},
         {'hello': "world", 'score': 1.75}]

具体来说,我想找到与得分键相关联的值的最小值,最大值和标准化值(意味着我必须更新现有词典)

我已经实现了明显的方法,如下所示。但是,我想知道是否有更好的方法来实现这一目标?

list1 = [{'hello': "world", 'score': 1.2}, {'hello': "world", 'score': 1.5}, {'hello': "world", 'score': 1.02},
         {'hello': "world", 'score': 1.75}]


def min_value(rank_norm):
    list_values = []
    for x in rank_norm:
        list_values.append(x['score'])
    return min(list_values)


def max_value(rank_norm):
    list_values = []
    for x in rank_norm:
        list_values.append(x['score'])
    return max(list_values)


def normalize_dict(rank_norm, min_val, max_val):
    for x in rank_norm:
        x['score'] = (x['score']-min_val)/(max_val - min_val)
    return rank_norm

min_val_list = min_value(list1)
max_val_list = max_value(list1)

print(min_val_list)
print(max_val_list)

print("Original dict:  ", list1)
print("Normalized dict: ", normalize_dict(list1, min_val_list, max_val_list))

我使用的是Python 3。

6 个答案:

答案 0 :(得分:3)

您可以像这样更新原始字典:

list1 = [{'hello': "world", 'score': 1.2}, {'hello': "world", 'score': 1.5}, {'hello': "world", 'score': 1.02},
     {'hello': "world", 'score': 1.75}]
values = [i["score"] for i in list1]
minimum = min(values)
maximum = max(values)
normalized_dict = [{a:b if a == "hello" else (b-minimum)/float(maximum-minimum) for a, b in i.items()} for i in list1]

输出:

[{'score': 0.24657534246575336, 'hello': 'world'}, {'score': 0.6575342465753424, 'hello': 'world'}, {'score': 0.0, 'hello': 'world'}, {'score': 1.0, 'hello': 'world'}]

答案 1 :(得分:2)

纯Python

是的,您可以使用生成器或列表推导来获得最小值和最大值:

from operator import itemgetter

def min_value(rank_norm):
    return min(map(itemgetter('score'),rank_norm))

def max_value(rank_norm):
    return max(map(itemgetter('score'),rank_norm))

您的字典规范化代码很好。但是,您可以使用 list comprehension 来构建带有词典的新列表。如果您不需要更新值,那么构建新列表往往更安全,因为您的代码的某些部分可能会引用旧列表或旧字典,而您本身并不想要改变这些:

def normalize_dict(rank_norm, min_val, max_val):
    delta = max_val-min_val
    return [dict(d,score=(d['score']-min_val)/delta) for d in rank_norm]

熊猫

如果项目数量很大,您可以使用pandas数据帧提高性能:

import pandas as pd

df = pd.DataFrame(list1)
sc = df['score']
sc_mi = sc.min()
df['score'] = (sc-sc_mi)/(sc.max()-sc_mi)

然后数据框是:

>>> df
   hello     score
0  world  0.246575
1  world  0.657534
2  world  0.000000
3  world  1.000000

您可以继续处理数据框,或者如果您想要字典列表,可以使用:

>>> list(df.T.to_dict().values())
[{'hello': 'world', 'score': 0.24657534246575336}, {'hello': 'world', 'score': 0.6575342465753424}, {'hello': 'world', 'score': 0.0}, {'hello': 'world', 'score': 1.0}]

答案 2 :(得分:2)

您可以将最小/最大计算合并为一个,而不是将分数列表构建两次并多次遍历列表

from operator import itemgetter

min_val, max_val = itemgetter(0, -1)(sorted([x['score'] for x in list1]))

答案 3 :(得分:2)

以下是max和min函数的更多pythonic方法:

def min_value(rank_norm):
    return min([x['score'] for x in rank_norm])

def max_value(rank_norm):
    return max([x['score'] for x in rank_norm])

不是那么快,而是更简单。此外,这里是使用单行表达式的normalize函数,这看起来不太好,但有效:

def normalize_dict(rank_norm, min_val, max_val):
    return [{'hello':x['hello'] , 'score':(x['score']-min_val)/(max_val - min_val)} for x in rank_norm]

答案 4 :(得分:2)

熊猫

import pandas as pd

your_list = [{'hello': "world", 'score': 1.2}, {'hello': "world", 'score': 1.5}, {'hello': "world", 'score': 1.02},
     {'hello': "world", 'score': 1.75}]

#Reading in to a pandas dataframe
d = pd.DataFrame.from_dict(your_list)

your_list已映射到dataframe

print(d)
   hello  score
0  world   1.20
1  world   1.50
2  world   1.02
3  world   1.75

计算统计信息并更新score

d['score'] = (d['score'] - min(d['score']))/(max(d['score'] - min(d['score'])))

d现在的样子,

print(d)
hello     score
0  world  0.246575
1  world  0.657534
2  world  0.000000
3  world  1.000000

将更新的数据框d写入字典

updated = pd.DataFrame.to_dict(d, orient = 'records')
print(updated)

[{'score': 0.24657534246575336, 'hello': 'world'}, {'score': 0.6575342465753424, 'hello': 'world'}, {'score': 0.0, 'hello': 'world'}, {'score': 1.0, 'hello': 'world'}]

答案 5 :(得分:1)

另一种使用operator.itemgetter的方法:根据分数对列表进行排序,提取最小和最大分数,处理..

import operator
a = [{'hello': "world3", 'score': 1.2},  .... ]

score = operator.itemgetter('score')
a.sort(key = score)
minimum = score(a[0])
maximum = score(a[-1])
span = maximum - minimum
for d in a:
    d['score'] = (d['score'] - minimum) / span