Question

我可以通过以下方式构造稀疏矩阵（例如csr）：

row = np.array([0, 0, 1, 2, 2, 2])
col = np.array([0, 2, 2, 0, 1, 2])
data = np.array([1, 2, 3, 4, 5, 6])
csr_matrix((data, (row, col)), shape=(3, 3))

考虑到我的输入数据，这将是最简单，最自然的方式。

现在，data可能会发生变化，而结构信息row和col保持不变。

如何在不重复整个构建过程的情况下有效设置新data？

（我希望我必须以某种方式从data获取一个线性索引图到内部矩阵存储中，但我不知道是否提供了它。）

Answer 1

您可以分配到.data -

sparse_matrix.data = new_data[np.lexsort([col, row])]

示例运行 -

In [109]: row = np.array([2, 0, 0, 1, 2, 2])
     ...: col = np.array([2, 0, 2, 2, 0, 1])
     ...: data = np.array([1, 2, 3, 4, 5, 6])
     ...: mat1 = csr_matrix((data, (row, col)), shape=(3, 3))
     ...: 

In [110]: mat1.toarray()
Out[110]: 
array([[2, 0, 3],
       [0, 0, 4],
       [5, 6, 1]])

In [111]: new_data = np.array([23,90,45,67,21,99])

In [112]: mat1.data = new_data[np.lexsort([col, row])]

In [113]: mat1.toarray()
Out[113]: 
array([[90,  0, 45],
       [ 0,  0, 67],
       [21, 99, 23]])

如果效果更好，我们可以将np.lexsort([col, row])替换为np.ravel_multi_index((row, col), mat1.shape).argsort()。

Answer 2

通常以这种方式制作稀疏coo矩阵时，其属性只是输入数组。

In [241]: Mo = sparse.coo_matrix((data, (row, col)), shape=(3,3))
In [242]: Mo.data
Out[242]: array([1, 2, 3, 4, 5, 6])
In [243]: Mo.row
Out[243]: array([0, 0, 1, 2, 2, 2])
In [244]: Mo.col
Out[244]: array([0, 2, 2, 0, 1, 2])
In [245]: id(Mo.col), id(col)       # ids match
Out[245]: (2851595992, 2851595992)

转换为csr（或调用csr_matrix）时，所有三个都会被转换。这是在编译的代码中完成的。将重复的坐标相加（如记录的那样）。另请注意新的dtypes：

In [246]: Mr = Mo.tocsr()
In [247]: Mr.data
Out[247]: array([1, 2, 3, 4, 5, 6], dtype=int32)
In [248]: Mr.indices
Out[248]: array([0, 2, 2, 0, 1, 2], dtype=int32)
In [249]: Mr.indptr
Out[249]: array([0, 2, 3, 6], dtype=int32)

coo值可以按任何顺序排列; csr个已排序。在您的示例中，这并未更改顺序（indptr使用精简形式替换row。）

您可以使用以下方式检查csr格式：

In [266]: Mr.has_canonical_format
Out[266]: True
In [267]: Mr.has_sorted_indices
Out[267]: True

如果这些不是真的，则可以使用Mr.sort_indices恢复它们。

我不知道哪些操作会扰乱订单。我非常确定您使用的构造会产生规范矩阵。

我注意到Mr.copy使用

def _with_data(self,data,copy=True):
    """Returns a matrix with the same sparsity structure as self,
    but with different data.  By default the structure arrays
    (i.e. .indptr and .indices) are copied.
    """
    if copy:
        return self.__class__((data,self.indices.copy(),self.indptr.copy()),
                               shape=self.shape,dtype=data.dtype)
    else:
        return self.__class__((data,self.indices,self.indptr),
                               shape=self.shape,dtype=data.dtype)

我不会付出太多努力来避免复制和新的矩阵。错误的是安全而不是速度。请注意，sparse会一直制作新的矩阵，例如在执行Mr[1,:]时或在进行数学运算时。对于某些重复操作，可能值得花时间在“就地”做事，但不要经常尝试这样做。

您可能还会发现sparse.bmat的代码具有指导性。它从较小的矩阵构建一个新的矩阵。请注意它如何将coo属性组合成一个新的大集。

lil也是面向行的：

In [256]: Ml = Mo.tolil()
In [257]: Ml.data
Out[257]: array([[1, 2], [3], [4, 5, 6]], dtype=object)
In [258]: Ml.rows
Out[258]: array([[0, 2], [2], [0, 1, 2]], dtype=object)

在scipy中设置稀疏矩阵中的新系数

2 个答案: