通过忽略零来计算稀疏矩阵的均值

时间:2019-10-25 14:23:13

标签: python scipy sparse-matrix

我一直在试图找到一种方法来计算稀疏矩阵的列式均值,而忽略零值。对于一个numpy的数组,我可以这样做:

arr = np.array([[1, 1, 1, 1, 1, 1, 0, 0, 0],
                [1, 0, 0, 0, 1, 0, 1, 0, 0],
                [0, 0, 0, 0, 1, 1, 0, 1, 0],
                [0, 1, 0, 0, 0, 0, 0, 1, 0],
                [0, 0, 0, 0, 0, 0, 0, 0, 4],
                [0, 0, 0, 0, 0, 0, 0, 0, 5],
                [0, 0, 0, 0, 0, 0, 0, 0, 1]])
arr[arr == 0] = np.nan
means = np.nanmean(arr, axis=0)

或者我可以做:

#I don't understand why tmean axis=1 doesn't work. I ended up with this
f = lambda x: tmean(x,(0,None),(False,None))
means = list(map(f,arr.T))

最后,我要在每列中保持高于平均值

arr[arr<means.reshape(1,arr.shape[1])]=0

array([[1., 1., 1., 1., 0., 1., 0., 0., 0.],
       [1., 0., 0., 0., 0., 0., 1., 0., 0.],
       [0., 0., 0., 0., 1., 1., 0., 1., 0.],
       [0., 1., 0., 0., 0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 4.],
       [0., 0., 0., 0., 0., 0., 0., 0., 5.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0.]])

如何使用稀疏矩阵实现此目标? arr_csr = csr_matrix(arr)

2 个答案:

答案 0 :(得分:0)

scipy稀疏矩阵具有数据和索引属性,您可以使用这两者来计算沿列轴的平均值,并进一步检查高于平均值的值。示例:

row = np.array([0,0,1,2,2,2])
col = np.array([0,2,2,0,1,2])
data = np.array([1,2,3,4,5,6])
m = csr_matrix((data,(row,col)),shape=(3,3))
# m = [[1,0,2],
#      [0,0,3],
#      [4,5,6]]
print(m.data)
print(m.indices)

编辑:以csr格式显示背景。 m.indices中的每个条目(假设我为m.indices[i])代表m.datam.data[i])中相应条目的列索引。

摘自scipy文档的示例:https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.csr_matrix.html

答案 1 :(得分:0)

In [339]: arr = np.array([[1, 1, 1, 1, 1, 1, 0, 0, 0], 
     ...:                 [1, 0, 0, 0, 1, 0, 1, 0, 0], 
     ...:                 [0, 0, 0, 0, 1, 1, 0, 1, 0], 
     ...:                 [0, 1, 0, 0, 0, 0, 0, 1, 0], 
     ...:                 [0, 0, 0, 0, 0, 0, 0, 0, 4], 
     ...:                 [0, 0, 0, 0, 0, 0, 0, 0, 5], 
     ...:                 [0, 0, 0, 0, 0, 0, 0, 0, 1]])                         
In [340]: sparse                                                                
Out[340]: <module 'scipy.sparse' from '/usr/local/lib/python3.6/dist-packages/scipy/sparse/__init__.py'>
In [341]: M =sparse.csr_matrix(arr)                                             
In [342]: M                                                                     
Out[342]: 
<7x9 sparse matrix of type '<class 'numpy.int64'>'
    with 17 stored elements in Compressed Sparse Row format>
In [343]: M.sum(axis=1)                                                         
Out[343]: 
matrix([[6],
        [3],
        [3],
        [2],
        [4],
        [5],
        [1]])
In [344]: M.getnnz(axis=1)                                                      
Out[344]: array([6, 3, 3, 2, 1, 1, 1], dtype=int32)
In [345]: M.sum(axis=1).A1/M.getnnz(axis=1)                                     
Out[345]: array([1., 1., 1., 1., 4., 5., 1.])
In [346]: M.mean(axis=1)                                                        
Out[346]: 
matrix([[0.66666667],
        [0.33333333],
        [0.33333333],
        [0.22222222],
        [0.44444444],
        [0.55555556],
        [0.11111111]])