我创建一个带有列表值的dict,其中包含一些参数。这些参数主要是float或int,有时是布尔值(在我的例子中存储为0或1)。现在我想选择dict的最佳条目(=具有最高参数的
)因此我需要规范化参数,以便每个参数仅在0 ... 1的范围内。 一种天真的方法是为每个列表“列”创建一个最大值列表,然后将所有值除以这个最大值:
import heapq
a = {1: [1.0, 23.7, 17.5, 0.2],
2: [0.0, 87.3, 11.2, 0.5],
3: [1.0, 17.4, 15.2, 0.7]}
ran = len(a.values()[0])
max = [0.0 for i in range(0,ran)]
for vals in a.values():
max = [max[x] if max[x] > vals[x] else vals[x] for x in range(0,ran)]
a = {k : [v[x]/max[x] for x in range(0,ran)] for k,v in a.items()}
best = heapq.nlargest(1, (v for v in a.values()), key=lambda v: sum(v))
print a
print best
这似乎在这里工作,但是我可以从这里进行任何优化吗?我必须处理的dicts将包含超过1000个条目,参数将在20到50的范围内。 我还需要在大约1000套dicts上做这个,所以快速的方法会有很大的帮助。
编辑:我现在用生成的数据测试了它:
import heapq
import random
def normalise(a):
ran = len(a.values()[0])
max = [0.0 for i in range(0,ran)]
for vals in a.values():
max = [max[x] if max[x] > vals[x] else vals[x] for x in range(0,ran)]
a = {k : [v[x]/max[x] for x in range(0,ran)] for k,v in a.items()}
# find best list
best = heapq.nlargest(1, (v for v in a.values()), key=lambda v: sum(v))
# test this 1000 times
for _ in xrange(1000):
a = { k: [1000.0*random.random() for i in xrange(50)] for k in xrange(1000)}
normalise(a)
并得到以下结果:
25,84s user 0,02s system 49% cpu 52,189 total, running python normalise.py
答案 0 :(得分:4)
你想直接在dict上循环并直接处理每个列表:
from operator import itemgetter
best = (0, [])
maxes = [max(c) for c in zip(*a.values())]
for k, v in a.iteritems():
v = a[k] = [c/m for c, m in zip(v, maxes)]
best = max([best, (sum(v), v)], key=itemgetter(0))
这使用zip(*iterable)
循环遍历a
的列。然后,我们将每行标准化为每列的最大值,并同时选出最佳行。
请注意heapq.nlargest(1, ...)
只使用max
代替,因为这是更有效的方法。
使用timeit
module衡量的时间与原始样本的对比:
>>> from timeit import timeit
>>> from operator import itemgetter
>>> import heapq
>>> def original(a):
... ran = len(a.values()[0])
... max = [0.0 for i in range(0,ran)]
... for vals in a.values():
... max = [max[x] if max[x] > vals[x] else vals[x] for x in range(0,ran)]
... a = {k : [v[x]/max[x] for x in range(0,ran)] for k,v in a.items()}
... best = heapq.nlargest(1, (v for v in a.values()), key=lambda v: sum(v))
...
>>> def zip_and_max(a):
... best = (0, [])
... maxes = [max(c) for c in zip(*a.values())]
... for k, v in a.iteritems():
... v = a[k] = [c/m for c, m in zip(v, maxes)]
... best = max([best, (sum(v), v)], key=itemgetter(0))
...
>>> timeit('f(a.copy())', 'from __main__ import a, original as f', number=100000)
2.6306018829345703
>>> timeit('f(a.copy())', 'from __main__ import a, zip_and_max as f', number=100000)
1.6974060535430908
并使用一个随机集:
>>> import random
>>> random_a = { k: [1000.0*random.random() for i in xrange(50)] for k in xrange(1000)}
>>> timeit('f(a.copy())', 'from __main__ import a, original as f', number=100000)
2.7121059894561768
>>> timeit('f(a.copy())', 'from __main__ import a, zip_and_max as f', number=100000)
1.745398998260498
每次都有一个随机组(注意,重复次数要低得多):
>>> timeit('f(r())', 'from __main__ import random_dict as r, original as f', number=100)
4.437845945358276
>>> timeit('f(r())', 'from __main__ import random_dict as r, zip_and_max as f', number=100)
3.2406938076019287
但听起来你在这里处理矩阵。您需要查看numpy
以获取 far 更高效的库来处理这些矩阵。
答案 1 :(得分:1)
这就是全部:
key, best = max(a.iteritems(), key = lambda t: sum(t[1])/max(t[1]))