Question

我正在为通用的“字典计数器”寻找更高效的实现。目前，与collections.Counter实现

相比，这种天真的函数产生更快的结果

def uniqueCounter(x):
    dx = defaultdict(int)
    for i in x:
        dx[i] += 1
    return dx

编辑：一些特征样本输入：

c1= zip(np.random.randint(0,2,200000),np.random.randint(0,2,200000))
c2= np.random.randint(0,2,200000)

c1: 
uniqueCounter timing: 
10 loops, best of 3: 61.1 ms per loop
collections.Counter timing:
10 loops, best of 3: 113 ms per loop 

c2:
uniqueCounter timing: 10 loops, best of 3: 57 ms per loop
collections.Counter timing: 10 loops, best of 3: 120 ms per loop

Answer 1

尝试使用numpy.bincount

In [19]: Counter(c2)
Out[19]: Counter({1: 100226, 0: 99774})

In [20]: uniqueCounter(c2)
Out[20]: defaultdict(<type 'int'>, {0: 99774, 1: 100226})

In [21]: np.bincount(c2)
Out[21]: array([ 99774, 100226])

一些时间：

In [16]: %timeit np.bincount(c2)
1000 loops, best of 3: 2 ms per loop

In [17]: %timeit uniqueCounter(c2)
1 loops, best of 3: 161 ms per loop

In [18]: %timeit Counter(c2)
1 loops, best of 3: 362 ms per loop

最有效的字典计数器

1 个答案: