当存在多个相同的id时,在元组中查找id:value对的最大值

时间:2014-06-15 13:01:25

标签: python tuples

我有一个包含大约27,000个id:价格对的元组,如此组织

((13217L, 15100004.27),
(27673L, 39070007.7),
(23133L, 7000001.03),
(31760L, 7600122.02),
(21611L, 28402830.02),
(19699L, 7500001.11),
(15753L, 50215503.2299),
(19117L, 61350002.11),
(30106L, 11121000.05),
)

在这个巨大的元组中,像这样的同一个id还有多个价格

(21611L, 28402830.02)
(21611L, 23000007.0)
(21611L, 28402653.6)
(21611L, 28403875.37)
(21611L, 28403875.38)
(21611L, 28403000.0)
(21611L, 28402845.71)

我的问题是,如果我想要一个新的元组/字典/列表(并不重要),它只包含与某个id相关的所有价格中最低的一个或具有特定id的所有价格中的最高价,这是最快的方式吗?

2 个答案:

答案 0 :(得分:2)

“最快的方式”指定不足,但您可以在(ID,价格)对的排序列表上使用itertools.groupby

from itertools import groupby
from operator import itemgetter

key = itemgetter(0)

maxprices = {id_: max(g)[1] for id_, g in groupby(sorted(pairs, key=key), key=key)}

此处,pairs将是您的元组,maxprices将是将ID映射到最高价格的字典。

答案 1 :(得分:1)

您可以使用defaultdict:

import random
import collections
import time

from itertools import groupby
from operator import itemgetter

# Test de N runs que guarde el nombre de cada cosa y muestre un ranking al final de todo!!

#-------------------------------------

randomPairsList=[]

for i in range(1000000):
    for j in range(1, random.randint(2,6)):
        randomPairsList.append([i,j])

sortedTuple = tuple(randomPairsList)
random.shuffle(randomPairsList)
unsortedTuple = tuple(randomPairsList)

#-------------------------------------

t0 = time.time()

key = itemgetter(0)

maxprices = {id_: min(g)[1] for id_, g in groupby(sorted(sortedTuple, key=key), key=key)}

print "groupby - SORTED:\t\t\t\t\t"+str(time.time()-t0)

#-------------------------------------

t0 = time.time()

key = itemgetter(0)

maxprices = {id_: min(g)[1] for id_, g in groupby(sorted(unsortedTuple, key=key), key=key)}

print "groupby - UNSORTED:\t\t\t\t\t"+str(time.time()-t0)

#-------------------------------------  

t0 = time.time()

d = collections.defaultdict(lambda: None)

for key, value in sortedTuple:
    d[key]=min(d[key], value)

print "\ndefaultdict (bad way) - SORTED:\t\t\t\t"+str(time.time()-t0)

#-------------------------------------  

t0 = time.time()

d = collections.defaultdict(lambda: None)

for key, value in unsortedTuple:
    d[key]=min(d[key], value)

print "defaultdict (bad way) - UNSORTED:\t\t\t"+str(time.time()-t0)

#-------------------------------------

t0 = time.time()

d = collections.defaultdict(lambda: None) # Actualizar: list si queremos append values, en este caso mejor None.

for key, value in sortedTuple:
    d[key]=min(d[key] or value, value)

print "\ndefaultdict (nicer, Python3 compatible!) - SORTED:\t"+str(time.time()-t0)

#-------------------------------------  

t0 = time.time()

d = collections.defaultdict(lambda: None) # Actualizar: list si queremos append values, en este caso mejor None.

for key, value in unsortedTuple:
    d[key]=min(d[key] or value, value)

print "defaultdict (nicer, Python3 compatible!) - UNSORTED:\t"+str(time.time()-t0)

#-------------------------------------

t0 = time.time()

d = dict()

for key, value in sortedTuple:
    d[key]=min(d.get(key, value), value)

print "\ndict (using parameter) - SORTED:\t\t\t"+str(time.time()-t0)

#-------------------------------------  

t0 = time.time()

d = dict()

for key, value in unsortedTuple:
    d[key]=min(d.get(key, value), value)

print "dict (using parameter) - UNSORTED:\t\t\t"+str(time.time()-t0)

#-------------------------------------

t0 = time.time()

d = dict()

for key, value in sortedTuple:
    d[key]=min(d.get(key) or value, value)

print "\ndict (not using parameter) - SORTED:\t\t\t"+str(time.time()-t0)

#-------------------------------------  

t0 = time.time()

d = dict()

for key, value in unsortedTuple:
    d[key]=min(d.get(key) or value, value)

print "dict (not using parameter) - UNSORTED:\t\t\t"+str(time.time()-t0)

#-------------------------------------

当元组已经排序时,使用groupbydefaultdict更快,但如果不排序则更慢。我得到了这些时间:

groupby - SORTED:                                       0.796000003815
groupby - UNSORTED:                                     4.63300013542

defaultdict (bad way) - SORTED:                         1.10599994659
defaultdict (bad way) - UNSORTED:                       1.96099996567

defaultdict (nicer, Python3 compatible!) - SORTED:      1.11000013351
defaultdict (nicer, Python3 compatible!) - UNSORTED:    1.95299983025

dict (using parameter) - SORTED:                        1.23400020599
dict (using parameter) - UNSORTED:                      2.09599995613

dict (not using parameter) - SORTED:                    1.14100003242
dict (not using parameter) - UNSORTED:                  1.98699998856