Question

我很难理解为什么会发生这种行为。

我有一个scipy稀疏csr矩阵。前十个要素是：

print my_mat[0:10,]

  (0, 31)       1
  (0, 33)       1
  (1, 36)       1
  (1, 40)       1
  (2, 47)       1
  (2, 48)       1
  (3, 50)       1
  (3, 53)       1
  (4, 58)       1
  (4, 60)       1
  (5, 66)       1
  (5, 68)       1
  (6, 73)       1
  (6, 75)       1
  (7, 77)       1
  (7, 82)       1
  (8, 30)       1
  (8, 32)       1
  (9, 37)       1
  (9, 40)       1

当我致电indptr时，我得到：

m1 = my_mat[0:10,]
print m1.indptr
[ 0  2  4  6  8 10 12 14 16 18 20]

为什么不让indptr的值等于：

0 0 1 1 2 2 3 3等（my_mat的第一列，这是this question接受的答案所隐含的内容）？我如何访问这些值？

Answer 1

对于CSR矩阵，m1.indptr不包含行索引。相反，对于行r，值start, end = m1.indptr[r:r+2]对将开始和结束索引赋予存储在行m1.data中的值的r。也就是说，m1.data[start:end]保存行r中的非零值。这些值的列位于m1.indices[start:end]。

在您的示例中，您有m1.indptr = [ 0 2 4 6 8 10 12 14 16 18 20]。因此，第一行中的非零值存储在m1.data[0:2]中，这些值所在的列存储在m1.indices[0:2]中。存储在第二行的非零值为m1.data[2:4]，其列为m1.indices[2:4]等。

如果你想要行和列索引，可能最简单的方法是使用nonzero()方法。例如，这是一个CSR矩阵：

In [50]: s
Out[50]: 
<5x8 sparse matrix of type '<class 'numpy.int64'>'
    with 4 stored elements in Compressed Sparse Row format>

In [51]: s.A
Out[51]: 
array([[ 0, 10, 40,  0,  0, 20,  0,  0],
       [ 0,  0,  0,  0,  0,  0,  0,  0],
       [ 0,  0,  0, 30,  0,  0,  0,  0],
       [ 0,  0,  0,  0,  0,  0,  0,  0],
       [ 0,  0,  0,  0,  0,  0,  0,  0]], dtype=int64)

这里我们使用nonzero()方法获取非零值的行索引和列索引：

In [71]: row, col = s.nonzero()

In [72]: row
Out[72]: array([0, 0, 0, 2], dtype=int32)

In [73]: col
Out[73]: array([1, 2, 5, 3], dtype=int32)

或者，您可以将数组转换为＆＃34; COO＆＃34; （坐标）格式。然后，您可以访问row和col属性：

In [52]: c = s.tocoo()

In [53]: c.row
Out[53]: array([0, 0, 0, 2], dtype=int32)

In [54]: c.col
Out[54]: array([1, 2, 5, 3], dtype=int32)

为什么indptr与此csr矩阵中的值不匹配？

1 个答案: