Question

如何以简单的单行代码（以及快速！）列出csr_matrix的所有非零元素？

我正在使用此代码：

edges_list = list([tuple(row) for row in np.transpose(A.nonzero())])
weight_list = [A[e] for e in edges_list]

但执行需要相当长的时间。

Answer 1

对于规范形式的CSR矩阵，直接访问数据数组：

A.data

但请注意，不是规范形式的矩阵可能在其表示中包含显式零或重复条目，这将需要特殊处理。例如，

# Merge duplicates and remove explicit zeros. Both operations modify A.
# We sum duplicates first because they might sum to zero - for example,
# if a 5 and a -5 are in the same spot, we have to sum them to 0 and then remove the 0.
A.sum_duplicates()
A.eliminate_zeros()

# Now use A.data
do_whatever_with(A.data)

Answer 2

您可以使用A.nonzero()直接索引到A：

In [19]: A = np.random.randint(0, 3, (3, 3))

In [20]: A
Out[20]: 
array([[2, 1, 1],
       [1, 2, 2],
       [0, 1, 0]])

In [21]: A[A.nonzero()]
Out[21]: array([2, 1, 1, 1, 2, 2, 1])

结果与您的方法相同：

In [22]: edges_list = list([tuple(row) for row in np.transpose(A.nonzero())])

In [23]: [A[e] for e in edges_list]
Out[23]: [2, 1, 1, 1, 2, 2, 1]

显然要快得多（如果矩阵变大则更多）：

In [25]: %timeit [A[e] for e in list([tuple(row) for row in np.transpose(A.nonzero())])]
10000 loops, best of 3: 48 µs per loop

In [26]: %timeit A[A.nonzero()]
100000 loops, best of 3: 10.7 µs per loop

也适用于scipy csr_matrix，虽然有更好的方法，如其他答案所示：

In [30]: M = scipy.sparse.csr_matrix(A)

In [31]: M[M.nonzero()]
Out[31]: matrix([[2, 1, 1, 1, 2, 2, 1]], dtype=int32)

Answer 3

只需使用A.data

即可

In [16]: from scipy.sparse import csr_matrix

In [17]: A = csr_matrix([[1,0,0],[0,2,0]])

In [18]: A.data
Out[18]: array([1, 2])

如果稀疏矩阵已被修改或安全，则应使用：A.eliminate_zeros()

In [19]: A[0,0] = 0

In [20]: A.data
Out[20]: array([0, 2])

In [21]: A.eliminate_zeros()

In [22]: A.data
Out[22]: array([2])

Answer 4

您可以像这样使用scipy.sparse.find：

>>> from scipy.sparse import csr_matrix, find
>>> A = csr_matrix([[7.0, 8.0, 0],[0, 0, 9.0]])
>>> find(A)
(array([0, 0, 1], dtype=int32), array([0, 1, 2], 
dtype=int32), array([ 7.,  8.,  9.]))

https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.find.html

从python中的稀疏矩阵列出非零元素

4 个答案: