Question

我有一个从l2中每个元组的第一个索引中提取的唯一整数列表。

我试图在元组的第一个索引（即唯一列表中的每个项目）中对groupby中的l2执行某些操作，因此我可以计算第二个索引的出现次数l2中存在l3中的元组。 - 请看例子。

为此，我为唯一列表中的每个项目设置了一个字典，并在每个循环后重置。 dict键是l3中的每个值。

我的代码工作正常，因为有很多循环，所以当我拥有大量数据时，它的速度非常慢。

是否有任何方法可以提高效率和速度？

l1 = [1,2,3]
l2 = [(1,'a'),(3,'c'),(3,'b'),(2,'b'),(1,'a'),(3,'a')]
l3 = ['a','b']

d = defaultdict(int)
for i in l1:
    d = d.fromkeys(d, 0) # reset dict values to 0
    for t in l2:
        if i==t[0]:
           if t[1] in l3:
               d[t[1]] +=1
    print d

示例：

when i == 1:
d = {'a':2,'b':0}

Answer 1

使l3 设置以进行快速成员资格测试。将所有基于l1的计数器放入字典中;这样你就不需要在所有上循环l1 ，只需使用t[0]值来选择正确的计数器：

counts = {i: defaultdict(int) for i in l1} s3 = set(l3) for t0, t1 in l2: # only count if t[1] is included in l3, and t[0] is in l1 if t1 not in s3 or t0 not in counts: continue counts[t0][t1] += 1 for d in counts.itervalues(): print d

这删除了两个乘数; len(l1)和len(l3)，所以O（NKM）循环现在是一个O（K）循环。

这确实会增加内存要求，因为您现在需要跟踪len(l1) defaultdict个对象。预先为这些对象分配内存也需要一些时间。

Answer 2

我会将defaultdict与Counter结合使用：

>>> from collections import defaultdict, Counter

然后你可以查询你想要的任何内容：

>>> grouper = defaultdict(Counter)
>>> for n, c in l2:
...     grouper[n][c] += 1
...
>>> grouper[1]
Counter({'a': 2})
>>> grouper[2]
Counter({'b': 1})
>>> grouper[3]
Counter({'b': 1, 'c': 1, 'a': 1})
>>> grouper[3]['a']
1
>>> grouper[3]['b']
1

Python使用for循环来做一个groupby是太慢，更快的方式？

2 个答案: