我一直在试图找到一种方法来计算稀疏矩阵的列式均值,而忽略零值。对于一个numpy的数组,我可以这样做:
arr = np.array([[1, 1, 1, 1, 1, 1, 0, 0, 0],
[1, 0, 0, 0, 1, 0, 1, 0, 0],
[0, 0, 0, 0, 1, 1, 0, 1, 0],
[0, 1, 0, 0, 0, 0, 0, 1, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 4],
[0, 0, 0, 0, 0, 0, 0, 0, 5],
[0, 0, 0, 0, 0, 0, 0, 0, 1]])
arr[arr == 0] = np.nan
means = np.nanmean(arr, axis=0)
或者我可以做:
#I don't understand why tmean axis=1 doesn't work. I ended up with this
f = lambda x: tmean(x,(0,None),(False,None))
means = list(map(f,arr.T))
最后,我要在每列中保持高于平均值
arr[arr<means.reshape(1,arr.shape[1])]=0
array([[1., 1., 1., 1., 0., 1., 0., 0., 0.],
[1., 0., 0., 0., 0., 0., 1., 0., 0.],
[0., 0., 0., 0., 1., 1., 0., 1., 0.],
[0., 1., 0., 0., 0., 0., 0., 1., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 4.],
[0., 0., 0., 0., 0., 0., 0., 0., 5.],
[0., 0., 0., 0., 0., 0., 0., 0., 0.]])
如何使用稀疏矩阵实现此目标? arr_csr = csr_matrix(arr)
答案 0 :(得分:0)
scipy
稀疏矩阵具有数据和索引属性,您可以使用这两者来计算沿列轴的平均值,并进一步检查高于平均值的值。示例:
row = np.array([0,0,1,2,2,2])
col = np.array([0,2,2,0,1,2])
data = np.array([1,2,3,4,5,6])
m = csr_matrix((data,(row,col)),shape=(3,3))
# m = [[1,0,2],
# [0,0,3],
# [4,5,6]]
print(m.data)
print(m.indices)
编辑:以csr格式显示背景。 m.indices
中的每个条目(假设我为m.indices[i]
)代表m.data
(m.data[i]
)中相应条目的列索引。
摘自scipy文档的示例:https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.csr_matrix.html
答案 1 :(得分:0)
In [339]: arr = np.array([[1, 1, 1, 1, 1, 1, 0, 0, 0],
...: [1, 0, 0, 0, 1, 0, 1, 0, 0],
...: [0, 0, 0, 0, 1, 1, 0, 1, 0],
...: [0, 1, 0, 0, 0, 0, 0, 1, 0],
...: [0, 0, 0, 0, 0, 0, 0, 0, 4],
...: [0, 0, 0, 0, 0, 0, 0, 0, 5],
...: [0, 0, 0, 0, 0, 0, 0, 0, 1]])
In [340]: sparse
Out[340]: <module 'scipy.sparse' from '/usr/local/lib/python3.6/dist-packages/scipy/sparse/__init__.py'>
In [341]: M =sparse.csr_matrix(arr)
In [342]: M
Out[342]:
<7x9 sparse matrix of type '<class 'numpy.int64'>'
with 17 stored elements in Compressed Sparse Row format>
In [343]: M.sum(axis=1)
Out[343]:
matrix([[6],
[3],
[3],
[2],
[4],
[5],
[1]])
In [344]: M.getnnz(axis=1)
Out[344]: array([6, 3, 3, 2, 1, 1, 1], dtype=int32)
In [345]: M.sum(axis=1).A1/M.getnnz(axis=1)
Out[345]: array([1., 1., 1., 1., 4., 5., 1.])
In [346]: M.mean(axis=1)
Out[346]:
matrix([[0.66666667],
[0.33333333],
[0.33333333],
[0.22222222],
[0.44444444],
[0.55555556],
[0.11111111]])