Question

我有一个2d numpy数组。A我想将np.bincount()应用于矩阵A的每一列，以生成另一个2d数组B原始矩阵A的每列的bincounts。

我的问题是np.bincount（）是一个采用1d数组的函数。它不是像B = A.max(axis=1)这样的数组方法。

除了令人讨厌的for循环之外，是否有更多pythonic / numpythic方法来生成此B数组？

import numpy as np

states = 4
rows = 8
cols = 4

A = np.random.randint(0,states,(rows,cols))
B = np.zeros((states,cols))

for x in range(A.shape[1]):
    B[:,x] =  np.bincount(A[:,x])

Answer 1

使用与this post中相同的哲学，这是一种矢量化方法 -

m = A.shape[1]    
n = A.max()+1
A1 = A + (n*np.arange(m))
out = np.bincount(A1.ravel(),minlength=n*m).reshape(m,-1).T

Answer 2

我建议使用np.apply_along_axis，这将允许您将1D方法（在本例中为np.bincount）应用于更高维数组的1D切片：

import numpy as np

states = 4
rows = 8
cols = 4

A = np.random.randint(0,states,(rows,cols))
B = np.zeros((states,cols))

B = np.apply_along_axis(np.bincount, axis=0, arr=A)

但是，你必须要小心。只有for的输出具有正确的形状时，此（以及建议的np.bincount - 循环）才有效。如果数组A的一列或多列中不存在最大状态，则输出的维度不会更小，因此代码将使用ValueError进行归档。

Answer 3

使用numpy_indexed包的解决方案（免责声明：我是它的作者）是完全矢量化的，因此不包括幕后的任何python循环。此外，输入没有限制;并非每列都需要包含同一组唯一值。

import numpy_indexed as npi
rowidx, colidx = np.indices(A.shape)
(bin, col), B = npi.count_table(A.flatten(), colidx.flatten())

这给出了相同结果的替代（稀疏）表示，如果B数组确实包含很多零，则可能更合适：

(bin, col), count = npi.count((A.flatten(), colidx.flatten()))

请注意，apply_along_axis只是for-loop的语法糖，具有相同的性能特征。

Answer 4

又一种可能性：

import numpy as np


def bincount_columns(x, minlength=None):
    nbins = x.max() + 1
    if minlength is not None:
        nbins = max(nbins, minlength)
    ncols = x.shape[1]
    count = np.zeros((nbins, ncols), dtype=int)
    colidx = np.arange(ncols)[None, :]
    np.add.at(count, (x, colidx), 1)
    return count

例如，

In [110]: x
Out[110]: 
array([[4, 2, 2, 3],
       [4, 3, 4, 4],
       [4, 3, 4, 4],
       [0, 2, 4, 0],
       [4, 1, 2, 1],
       [4, 2, 4, 3]])

In [111]: bincount_columns(x)
Out[111]: 
array([[1, 0, 0, 1],
       [0, 1, 0, 1],
       [0, 3, 2, 0],
       [0, 2, 0, 2],
       [5, 0, 4, 2]])

In [112]: bincount_columns(x, minlength=7)
Out[112]: 
array([[1, 0, 0, 1],
       [0, 1, 0, 1],
       [0, 3, 2, 0],
       [0, 2, 0, 2],
       [5, 0, 4, 2],
       [0, 0, 0, 0],
       [0, 0, 0, 0]])

矢量化numpy bincount

4 个答案: