我正在为通用的“字典计数器”寻找更高效的实现。 目前,与collections.Counter实现
相比,这种天真的函数产生更快的结果def uniqueCounter(x):
dx = defaultdict(int)
for i in x:
dx[i] += 1
return dx
编辑: 一些特征样本输入:
c1= zip(np.random.randint(0,2,200000),np.random.randint(0,2,200000))
c2= np.random.randint(0,2,200000)
c1:
uniqueCounter timing:
10 loops, best of 3: 61.1 ms per loop
collections.Counter timing:
10 loops, best of 3: 113 ms per loop
c2:
uniqueCounter timing: 10 loops, best of 3: 57 ms per loop
collections.Counter timing: 10 loops, best of 3: 120 ms per loop
答案 0 :(得分:1)
尝试使用numpy.bincount
In [19]: Counter(c2)
Out[19]: Counter({1: 100226, 0: 99774})
In [20]: uniqueCounter(c2)
Out[20]: defaultdict(<type 'int'>, {0: 99774, 1: 100226})
In [21]: np.bincount(c2)
Out[21]: array([ 99774, 100226])
一些时间:
In [16]: %timeit np.bincount(c2)
1000 loops, best of 3: 2 ms per loop
In [17]: %timeit uniqueCounter(c2)
1 loops, best of 3: 161 ms per loop
In [18]: %timeit Counter(c2)
1 loops, best of 3: 362 ms per loop