Question

给定一个2 x维的numpy数组M，我想计算M的每一列的出现次数。也就是说，我正在寻找bincount的一般版本。

到目前为止我尝试过的方法：（1）将列转换为元组（2）将散列元组（通过hash）转换为使用numpy.bincount的自然数（3）。

这看起来很笨拙。有人知道更优雅高效的方式吗？

Answer 1

您可以使用collections.Counter：

>>> import numpy as np
>>> a = np.array([[ 0,  1,  2,  4,  5,  1,  2,  3],
...               [ 4,  5,  6,  8,  9,  5,  6,  7],
...               [ 8,  9, 10, 12, 13,  9, 10, 11]])
>>> from collections import Counter
>>> Counter(map(tuple, a.T))
Counter({(2, 6, 10): 2, (1, 5, 9): 2, (4, 8, 12): 1, (5, 9, 13): 1, (3, 7, 11):
1, (0, 4, 8): 1})

Answer 2

假设：

a = np.array([[ 0,  1,  2,  4,  5,  1,  2,  3],
              [ 4,  5,  6,  8,  9,  5,  6,  7],
              [ 8,  9, 10, 12, 13,  9, 10, 11]])
b = np.transpose(a)

比散列更有效的解决方案（仍需要操作）：

我使用灵活的数据类型np.void（参见here）创建数组视图，使每行成为单个元素。转换为此形状将允许np.unique对其进行操作。

%%timeit    
c = np.ascontiguousarray(b).view(np.dtype((np.void, b.dtype.itemsize*b.shape[1])))
_, index, counts = np.unique(c, return_index = True, return_counts = True)
#counts are in the last column, remember original array is transposed
>>>np.concatenate((b[idx], cnt[:, None]), axis = 1)
array([[ 0,  4,  8,  1],
       [ 1,  5,  9,  2],
       [ 2,  6, 10,  2],
       [ 3,  7, 11,  1],
       [ 4,  8, 12,  1],
       [ 5,  9, 13,  1]])
10000 loops, best of 3: 65.4 µs per loop

附加到a。

您的哈希解决方案。

%%timeit
array_hash = [hash(tuple(row)) for row in b]
uniq, index, counts = np.unique(array_hash, return_index= True, return_counts = True)
np.concatenate((b[idx], cnt[:, None]), axis = 1)
10000 loops, best of 3: 89.5 µs per loop

更新：Eph的解决方案是最有效和最优雅的。

%%timeit
Counter(map(tuple, a.T))
10000 loops, best of 3: 38.3 µs per loop

计算numpy数组中列的出现次数

2 个答案: