Question

我有一个生成相同长度的单维numpy.array的生成器。我想有一个包含该数据的稀疏矩阵。行以相同的顺序生成，我希望将它们放在最终矩阵中。 csr矩阵优于lil矩阵，但我认为后者在我描述的场景中更容易构建。

假设row_gen是一个产生numpy.array行的生成器，以下代码按预期工作。

def row_gen():
    yield numpy.array([1, 2, 3])
    yield numpy.array([1, 0, 1])
    yield numpy.array([1, 0, 0])

matrix = scipy.sparse.lil_matrix(list(row_gen()))

因为该列表基本上会破坏生成器的任何优点，所以我希望以下内容具有相同的最终结果。更具体地说，我无法在内存中保存整个密集矩阵（或所有矩阵行的列表）：

def row_gen():
    yield numpy.array([1, 2, 3])
    yield numpy.array([1, 0, 1])
    yield numpy.array([1, 0, 0])

matrix = scipy.sparse.lil_matrix(row_gen())

但是在运行时会引发以下异常：

TypeError: no supported conversion for types: (dtype('O'),)

我还注意到跟踪包括以下内容：

File "/usr/local/lib/python2.7/site-packages/scipy/sparse/lil.py", line 122, in __init__
  A = csr_matrix(A, dtype=dtype).tolil()

这让我觉得使用scipy.sparse.lil_matrix最终会创建一个csr矩阵，然后才会将其转换为lil矩阵。在这种情况下，我宁愿只创建csr矩阵。

回顾一下，我的问题是：从python生成器或numpy单维数组创建scipy.sparse矩阵的最有效方法是什么？

Answer 1

让我们看一下sparse.lil_matrix的代码。它检查第一个参数：

if isspmatrix(arg1):    # is is already a sparse matrix
     ...
elif isinstance(arg1,tuple):    # is it the shape tuple
    if isshape(arg1):
        if shape is not None:
            raise ValueError('invalid use of shape parameter')
        M, N = arg1
        self.shape = (M,N)
        self.rows = np.empty((M,), dtype=object)
        self.data = np.empty((M,), dtype=object)
        for i in range(M):
            self.rows[i] = []
            self.data[i] = []
    else:
        raise TypeError('unrecognized lil_matrix constructor usage')
else:
    # assume A is dense
    try:
        A = np.asmatrix(arg1)
    except TypeError:
        raise TypeError('unsupported matrix type')
    else:
        from .csr import csr_matrix
        A = csr_matrix(A, dtype=dtype).tolil()

        self.shape = A.shape
        self.dtype = A.dtype
        self.rows = A.rows
        self.data = A.data

根据文档 - 您可以从另一个稀疏矩阵，形状和密集阵列构造它。密集数组构造函数首先生成csr矩阵，然后将其转换为lil。

形状版本使用以下数据构建空lil：

In [161]: M=sparse.lil_matrix((3,5),dtype=int)
In [163]: M.data
Out[163]: array([[], [], []], dtype=object)
In [164]: M.rows
Out[164]: array([[], [], []], dtype=object)

显然，通过发电机是不行的 - 它不是一个密集的阵列。

但是创建了lil矩阵后，您可以使用常规数组赋值来填充元素：

In [167]: M[0,:]=[1,0,2,0,0]
In [168]: M[1,:]=[0,0,2,0,0]
In [169]: M[2,3:]=[1,1]
In [170]: M.data
Out[170]: array([[1, 2], [2], [1, 1]], dtype=object)
In [171]: M.rows
Out[171]: array([[0, 2], [2], [3, 4]], dtype=object)
In [172]: M.A
Out[172]: 
array([[1, 0, 2, 0, 0],
       [0, 0, 2, 0, 0],
       [0, 0, 0, 1, 1]])

并且您可以直接为子列表指定值（我认为这更快，但更危险）：

In [173]: M.data[1]=[1,2,3]
In [174]: M.rows[1]=[0,2,4]
In [176]: M.A
Out[176]: 
array([[1, 0, 2, 0, 0],
       [1, 0, 2, 0, 3],
       [0, 0, 0, 1, 1]])

另一种增量方法是构建3个coo格式的数组或列表，然后从中创建coo或csr。

sparse.bmat是另一种选择，其代码是构建coo输入的一个很好的示例。我会让你自己看看。

有效地使用python生成器创建scipy.lil_matrix

1 个答案: