Question

我有一个形状为(10,10000)的矩阵。对于矩阵中的每一列，我希望在最大值索引处有一个1，在其他值0处有一个。有什么方法可以避免for循环吗？

Answer 1

这里是使用numpy的一个选项。首先导入numpy并将矩阵转换为numpy数组：

import numpy as np
my_mat = np.asarray(my_original_mat)

现在用一个小的矩阵作为例子：

mat = np.random.randint(1, 10, size=(4, 4))
# array([[3, 9, 3, 1],
#       [1, 4, 2, 3],
#       [8, 4, 4, 2],
#       [7, 7, 3, 7]])
new_mat = np.zeros(mat.shape)  # our zeros and ones will go here
new_mat[np.argmax(mat, axis=0), np.arange(mat.shape[1])] = 1
# array([[0., 1., 0., 0.],
#        [0., 0., 0., 0.],
#        [1., 0., 1., 0.],
#        [0., 0., 0., 1.]])

基本上使用numpy切片来避免需要循环。 new_mat[np.argmax(...), np.arange(...)]行为每一列指定包含最大值的行，并将这些行列对设置为1。似乎可以使用。

请注意，如果您重复了最大值，则只会将第一个（最上面的）最大值设置为1。

另一个为您提供每个最大值的1s选项，包括重复的（我看到jdehesa在评论中击败了我，但为了完整起见在此重复）：

(mat == mat.max(axis=0)).astype(mat.dtype)

Answer 2

在稀疏存储中创建此矩阵实际上非常容易。

>>> from scipy.sparse import csc_matrix
>>> 
>>> m, n = 3, 7
>>> 
>>> data = np.random.randint(0, 10, (m, n))
>>> 
>>> data
array([[9, 0, 0, 7, 3, 1, 3],
       [8, 0, 4, 4, 3, 2, 4],
       [2, 3, 2, 5, 7, 5, 3]])
>>> 
>>> result = csc_matrix((np.ones(n), data.argmax(0), np.arange(n+1)), (m, n))
>>> result
<3x7 sparse matrix of type '<class 'numpy.float64'>'
        with 7 stored elements in Compressed Sparse Column format>
>>> result.A
array([[1., 0., 0., 1., 0., 0., 0.],
       [0., 0., 1., 0., 0., 0., 1.],
       [0., 1., 0., 0., 1., 1., 0.]])

如何在矩阵中将每个最大值更改为1，沿列的其他值为0？

2 个答案: