Question

我希望从2-D nararray中获取列的真实值的行索引。到目前为止，我有一个for循环的解决方案。但我认为这不是有效的，因为它存在一个python native for-loop。我试图找出一个矢量化的解决方案，但失败了。

更新：没有必要成为矢量化解决方案，效率更高更好。

arr = np.random.randint(2, size=15).reshape((3,5)).astype(bool)
print arr

[[ True False  True False  True]
 [False  True False  True  True]
 [ True  True False False  True]]

def calc(matrix):
    result = []
    for i in range(matrix.shape[1]):
        result.append(np.argwhere(matrix[:, i]).flatten().tolist())
    return result

print calc(arr)
[[0, 2], [1, 2], [0], [1], [0, 1, 2]]

注意：我希望按列分组的行索引。当列全部为False时，我需要获取一个空列表[]而不是跳过。

Answer 1

方法＃1

这是一个矢量化的NumPy方法，将这些行索引分组在一个数组列表中 -

r,c = np.where(arr.T)
out = np.split(c, np.flatnonzero(r[1:] != r[:-1])+1)

示例运行 -

In [63]: arr = np.random.randint(2, size=15).reshape((3,5)).astype(bool)

In [64]: arr
Out[64]: 
array([[False, False,  True,  True, False],
       [ True,  True, False, False,  True],
       [ True,  True, False, False,  True]], dtype=bool)

In [65]: r,c = np.where(arr.T)

In [66]: np.split(c, np.flatnonzero(r[1:] != r[:-1])+1)
Out[66]: [array([1, 2]), array([1, 2]), array([0]), array([0]), array([1, 2])]

In [67]: calc(arr)
Out[67]: [[1, 2], [1, 2], [0], [0], [1, 2]]

方法＃2

或者，我们可以使用loop comprehension来避免分裂 -

idx = np.concatenate(([0], np.flatnonzero(r[1:] != r[:-1])+1, [r.size] ))
out = [c[idx[i]:idx[i+1]] for i in range(len(idx)-1)]

我们正在使用方法＃1中的r,c。

方法＃3（输出所有0列的空列表/数组）

要考虑我们需要空列表/数组的所有零列，这里有一个修改过的方法 -

idx = np.concatenate(([0], arr.sum(0).cumsum() ))
out = [c[idx[i]:idx[i+1]] for i in range(len(idx)-1)]

我们正在使用方法＃1中的c。

示例运行 -

In [177]: arr
Out[177]: 
array([[ True, False, False, False, False],
       [ True, False, False, False,  True],
       [ True, False,  True, False,  True]], dtype=bool)

In [178]: idx = np.concatenate(([0], arr.sum(0).cumsum() ))
     ...: out = [c[idx[i]:idx[i+1]] for i in range(len(idx)-1)]
     ...: 

In [179]: out
Out[179]: 
[array([0, 1, 2]),
 array([], dtype=int64),
 array([2]),
 array([], dtype=int64),
 array([1, 2])]

方法＃4

这是处理所有0s列的另一种方式 -

unq, IDs = np.unique(r, return_index=1)
idx = np.concatenate(( IDs, [r.size] ))
out = [[]]*arr.shape[1]
for i,item in enumerate(unq):
    out[item] = c[idx[i]:idx[i+1]]

我们正在使用方法＃1中的r,c。

Answer 2

我的解决方案是

collumn, row = np.where(arr.T)
unique, indices = np.unique(collumn, return_index=True)
np.split(row, indices[1:])

它比@Divakar提出的要慢一点。但是我发现它更具可读性，因为可以避免复杂的np.flatnonzero(r[1:] != r[:-1])+1部分，所以很快就会发现会发生什么。

numpy以列的形式获取真值的行索引的最有效方法

2 个答案: