Question

当我手动构建scipy.sparse.csr_matrix时，是否可以使用uint32类型用于indptr和indices？矩阵的点方法会返回正确的答案吗？

以下示例似乎没问题......不确定这是否正式确定。

import numpy as np
import scipy.sparse as spsp
x = np.random.choice([0,1],size=(1000,1000), replace=True, p=[0.9,0.1])
x = x.astype(np.uint8)

x_csr = spsp.csr_matrix(x)
x_csr.indptr = x_csr.indptr.astype(np.uint32)
x_csr.indices = x_csr.indices.astype(np.uint32)

x_csr_selfdot = x_csr.dot(x_csr.T)
x_selfdot = x.dot(x.T)

print(np.sum(x_selfdot != x_csr_selfdot))

x_csr.data是一个1的数组.Scipy不允许我使用单个数字来替换整个x_csr.data数组。

Answer 1

我不确定你的目标是什么。你在做什么（有点）

In [237]: X=x_csr.dot(x_csr.T)

In [238]: np.allclose(X.A,x.dot(x.T))
Out[238]: True

也就是说，与修改后的x_csr的乘法有效。

但请注意，对x_csr进行任何操作会使新的稀疏矩阵恢复为int32的索引

In [240]: x_csr.indptr
Out[240]: array([    0,   112,   216, ..., 99652, 99751, 99853], dtype=uint32)

In [241]: x_csr.T.indptr
Out[241]: array([    0,   112,   216, ..., 99652, 99751, 99853], dtype=int32)

In [242]: X.indptr
Out[242]: array([     0,   1000,   2000, ..., 997962, 998962, 999962], dtype=int32)

In [260]: x_csr[:].indptr
Out[260]: array([    0,   112,   216, ..., 99652, 99751, 99853], dtype=int32)

保留.data的dtype，但在创建新矩阵时，sparse会生成自己的indptr和indices数组。它不会试图查看原件。

是的，data属性必须为矩阵的每个非零元素都有一个值。因此data与indices的大小相同。在coo格式中，row和col也匹配data。

同样print(x_csr)在x_csr.tocoo()：

时会出错

--> 931         _sparsetools.expandptr(major_dim,self.indptr,major_indices)
ValueError: Output dtype not compatible with inputs.

一般情况下，请勿尝试使用indices矩阵的indptr和csr。让sparse代码处理这些问题。

=====================

x_csr.dot由x_csr.__mul__执行，当other稀疏时由x_csr._mul_sparse_matrix(self, other)完成。这使用sparse.sputils.get_index_dtype来确定返回值索引的dtype。它选择Suitable index data type (int32 or int64)。

它还将所有输入转换为此dtype

np.asarray(self.indptr, dtype=idx_dtype),

因此，您尝试更改x_csr.indptr dtype不会更改计算方法。另请注意，在完成所有此准备工作后，实际的乘法将在已编译的C代码（csr_matmat_pass1，csr_matmat_pass2）中执行。

Scipy Sparse Matrix用非int64（indptr，indices）构建

1 个答案: