Question

例如，我有一个稀疏csr格式的矩阵：

from scipy.sparse import csr_matrix
import numpy as np
row = np.array([0, 0, 1, 2, 2, 2])
col = np.array([0, 2, 2, 0, 1, 2])
data = np.array([1, 2, 3, 4, 5, 6])
M  = csr_matrix((data, (row, col)), shape=(3, 3)) 
M.A = 
array([[1, 0, 2],
       [0, 0, 3],
       [4, 5, 6]])

我使用以下方法对索引为[2,0,1]的矩阵进行重新排序：

order = np.array([2,0,1])
M = M[order,:]
M = M[:,order]
M.A
array([[6, 4, 5],
       [2, 1, 0],
       [3, 0, 0]])

这种方法有效，但是对于我的实际csr_matrix来说是不可行的，它的大小为16580746 X 1672751804并导致内存错误。我采用了另一种方法：

edge_list = zip(row,col,dat)
index = dict(zip(order, range(len(order))))
all_coeff = zip(*((index[u], index[v],d) for u,v,d in edge_list if u in index and v in index))
new_row,new_col,new_data = all_coeff
n = len(order)
graph  = csr_matrix((new_data, (new_row, new_col)), shape=(n, n))

这也可以工作，但是对于大的稀疏矩阵会陷入相同的内存错误陷阱。有什么建议可以有效地做到这一点吗？

Answer 1

让我们觉得聪明

为什么不对矩阵重新排序，为什么不直接处理开始时提供的行和列索引呢？

例如，您可以通过以下方式替换行索引：

[0, 0, 1, 2, 2, 2]

收件人：

[2, 2, 0, 1, 1, 1]

您的列索引来自：

[0, 2, 2, 0, 1, 2]

收件人：

[2, 1, 1, 2, 0, 1]

Answer 2

我发现使用矩阵运算是最有效的。这是一个将行和/或列排列到指定顺序的函数。如果需要，可以对其进行修改以交换两个特定的行/列。

from scipy import sparse

def permute_sparse_matrix(M, new_row_order=None, new_col_order=None):
    """
    Reorders the rows and/or columns in a scipy sparse matrix 
        using the specified array(s) of indexes
        e.g., [1,0,2,3,...] would swap the first and second row/col.
    """
    if new_row_order is None and new_col_order is None:
        return M
    
    new_M = M
    if new_row_order is not None:
        I = sparse.eye(M.shape[0]).tocoo()
        I.row = I.row[new_row_order]
        new_M = I.dot(new_M)
    if new_col_order is not None:
        I = sparse.eye(M.shape[1]).tocoo()
        I.col = I.col[new_col_order]
        new_M = new_M.dot(I)
    return new_M

对CSR矩阵中的行和列进行重新排序

2 个答案: