我有一个包含大约27,000个id:价格对的元组,如此组织
((13217L, 15100004.27),
(27673L, 39070007.7),
(23133L, 7000001.03),
(31760L, 7600122.02),
(21611L, 28402830.02),
(19699L, 7500001.11),
(15753L, 50215503.2299),
(19117L, 61350002.11),
(30106L, 11121000.05),
)
在这个巨大的元组中,像这样的同一个id还有多个价格
(21611L, 28402830.02)
(21611L, 23000007.0)
(21611L, 28402653.6)
(21611L, 28403875.37)
(21611L, 28403875.38)
(21611L, 28403000.0)
(21611L, 28402845.71)
我的问题是,如果我想要一个新的元组/字典/列表(并不重要),它只包含与某个id相关的所有价格中最低的一个或具有特定id的所有价格中的最高价,这是最快的方式吗?
答案 0 :(得分:2)
“最快的方式”指定不足,但您可以在(ID,价格)对的排序列表上使用itertools.groupby
:
from itertools import groupby
from operator import itemgetter
key = itemgetter(0)
maxprices = {id_: max(g)[1] for id_, g in groupby(sorted(pairs, key=key), key=key)}
此处,pairs
将是您的元组,maxprices
将是将ID映射到最高价格的字典。
答案 1 :(得分:1)
您可以使用defaultdict:
import random
import collections
import time
from itertools import groupby
from operator import itemgetter
# Test de N runs que guarde el nombre de cada cosa y muestre un ranking al final de todo!!
#-------------------------------------
randomPairsList=[]
for i in range(1000000):
for j in range(1, random.randint(2,6)):
randomPairsList.append([i,j])
sortedTuple = tuple(randomPairsList)
random.shuffle(randomPairsList)
unsortedTuple = tuple(randomPairsList)
#-------------------------------------
t0 = time.time()
key = itemgetter(0)
maxprices = {id_: min(g)[1] for id_, g in groupby(sorted(sortedTuple, key=key), key=key)}
print "groupby - SORTED:\t\t\t\t\t"+str(time.time()-t0)
#-------------------------------------
t0 = time.time()
key = itemgetter(0)
maxprices = {id_: min(g)[1] for id_, g in groupby(sorted(unsortedTuple, key=key), key=key)}
print "groupby - UNSORTED:\t\t\t\t\t"+str(time.time()-t0)
#-------------------------------------
t0 = time.time()
d = collections.defaultdict(lambda: None)
for key, value in sortedTuple:
d[key]=min(d[key], value)
print "\ndefaultdict (bad way) - SORTED:\t\t\t\t"+str(time.time()-t0)
#-------------------------------------
t0 = time.time()
d = collections.defaultdict(lambda: None)
for key, value in unsortedTuple:
d[key]=min(d[key], value)
print "defaultdict (bad way) - UNSORTED:\t\t\t"+str(time.time()-t0)
#-------------------------------------
t0 = time.time()
d = collections.defaultdict(lambda: None) # Actualizar: list si queremos append values, en este caso mejor None.
for key, value in sortedTuple:
d[key]=min(d[key] or value, value)
print "\ndefaultdict (nicer, Python3 compatible!) - SORTED:\t"+str(time.time()-t0)
#-------------------------------------
t0 = time.time()
d = collections.defaultdict(lambda: None) # Actualizar: list si queremos append values, en este caso mejor None.
for key, value in unsortedTuple:
d[key]=min(d[key] or value, value)
print "defaultdict (nicer, Python3 compatible!) - UNSORTED:\t"+str(time.time()-t0)
#-------------------------------------
t0 = time.time()
d = dict()
for key, value in sortedTuple:
d[key]=min(d.get(key, value), value)
print "\ndict (using parameter) - SORTED:\t\t\t"+str(time.time()-t0)
#-------------------------------------
t0 = time.time()
d = dict()
for key, value in unsortedTuple:
d[key]=min(d.get(key, value), value)
print "dict (using parameter) - UNSORTED:\t\t\t"+str(time.time()-t0)
#-------------------------------------
t0 = time.time()
d = dict()
for key, value in sortedTuple:
d[key]=min(d.get(key) or value, value)
print "\ndict (not using parameter) - SORTED:\t\t\t"+str(time.time()-t0)
#-------------------------------------
t0 = time.time()
d = dict()
for key, value in unsortedTuple:
d[key]=min(d.get(key) or value, value)
print "dict (not using parameter) - UNSORTED:\t\t\t"+str(time.time()-t0)
#-------------------------------------
当元组已经排序时,使用groupby
比defaultdict
更快,但如果不排序则更慢。我得到了这些时间:
groupby - SORTED: 0.796000003815
groupby - UNSORTED: 4.63300013542
defaultdict (bad way) - SORTED: 1.10599994659
defaultdict (bad way) - UNSORTED: 1.96099996567
defaultdict (nicer, Python3 compatible!) - SORTED: 1.11000013351
defaultdict (nicer, Python3 compatible!) - UNSORTED: 1.95299983025
dict (using parameter) - SORTED: 1.23400020599
dict (using parameter) - UNSORTED: 2.09599995613
dict (not using parameter) - SORTED: 1.14100003242
dict (not using parameter) - UNSORTED: 1.98699998856