Question

假设我有一个矩阵：

4 0 3 5
0 2 6 0
7 0 1 0

我想把它二进制化为：

0 0 0 0
0 1 0 0
0 0 1 0

设置阈值等于2，任何大于阈值的元素都设置为0，任何小于或等于阈值的元素（0除外）都设置为1.

我们可以在python的csr_matrix或任何其他稀疏矩阵上执行此操作吗？

我知道scikit-learn提供Binarizer将低于或等于阈值的值替换为0，高于1。

Answer 1

当处理稀疏矩阵s时，避免包含零的不等式，因为稀疏矩阵（如果你正确使用它）应该有很多零并形成一个包含所有位置的数组。零将是巨大的。因此，请避免使用s <= 2。使用从零开始选择的不等式。

import numpy as np
from scipy import sparse

s = sparse.csr_matrix(np.array([[4, 0, 3, 5],
         [0, 2, 6, 0],
         [7, 0, 1, 0]]))

print(s)
# <3x4 sparse matrix of type '<type 'numpy.int64'>'
#   with 7 stored elements in Compressed Sparse Row format>

s[s > 2] = 0
s[s != 0] = 1

print(s.todense())

产量

matrix([[0, 0, 0, 0],
        [0, 1, 0, 0],
        [0, 0, 1, 0]])

Answer 2

您可以使用numpy.where：

>>> import numpy as np
>>> import scipy.sparse
>>> mat = scipy.sparse.csr_matrix(np.array([[4, 0, 3, 5],
         [0, 2, 6, 0],
         [7, 0, 1, 0]])).todense()
>>> np.where(np.logical_and(mat <= 2, mat !=0), 1, 0)
matrix([[0, 0, 0, 0],
        [0, 1, 0, 0],
        [0, 0, 1, 0]])

Answer 3

可能有非常有效的方法，但可以使用简单的function和list操作来实现，如下所示

def binarized(matrix, threshold):
    for row in matrix:
        for each in range(len(matrix)+1):
            if row[each] > threshold:
                row[each] = 0
            elif row[each] != 0:
                row[each] = 1
    return matrix


matrix = [[4, 0, 3, 5],
          [0, 2, 6, 0],
          [7, 0, 1, 0]]

print binarized(matrix, 2)

Yeilds ：

[[0, 0, 0, 0],
 [0, 1, 0, 0],
 [0, 0, 1, 0]]

Answer 4

import numpy as np                                                                                            

x = np.array([[4, 0, 3, 5],                                                                                   
              [0, 2, 6, 0],                                                                                   
              [7, 0, 1, 0]])                                                                                  

threshold = 2                                                                                                  
x[x<=0]=threshold+1                                                                                            
x[x<=threshold]=1                                                                                              
x[x>threshold]=0                                                                                               
print x

<强>输出：

[[0 0 0 0]
 [0 1 0 0]
 [0 0 1 0]]

以不同的方式在python中二进制化稀疏矩阵

4 个答案: