Question

我有两个数组A和B.在NumPy中，您可以使用A作为B的索引，例如

A = np.array([[1,2,3,1,7,3,1,2,3],[4,5,6,4,5,6,4,5,6],[7,8,9,7,8,9,7,8,9]])
B= np.array([1,2,3,4,5,6,7,8,9,0])
c = B[A]

产生：

[[2 3 4 2 8 4 2 3 4] [5 6 7 5 6 7 5 6 7] [8 9 0 8 9 0 8 9 0]]

但是，在我的情况下，数组A和B是SciPy CSR稀疏数组，它们似乎不支持索引。

A_sparse = sparse.csr_matrix(A)
B_sparse = sparse.csr_matrix(B)
c = B_sparse[A_sparse]

这导致：

IndexError：不支持使用稀疏矩阵进行索引，除了布尔索引，其中矩阵和索引是相同的形状。

我已经提出了下面的函数，用稀疏数组复制NumPy的行为：

 def index_sparse(A,B):       
        A_sparse = scipy.sparse.coo_matrix(A)
        B_sparse = sparse.csr_matrix(B)
        res = sparse.csr_matrix(A_sparse)
        for i,j,v in zip(A_sparse.row, A_sparse.col, A_sparse.data):
            res[i,j] = B_sparse[0, v]
        return res

res = index_sparse(A, B)
print res.todense()

循环遍历数组并且必须在Python中创建一个新数组并不理想。有没有更好的方法使用SciPy / NumPy的内置函数？

Answer 1

稀疏索引不太发达。例如coo格式根本没有实现它。

我没有尝试过实现这个问题，尽管我已经回答了涉及使用稀疏格式属性的其他人。所以我只是做一些一般的观察。

B_sparse是一个矩阵，因此其形状为(1,10)。所以相当于B[A]是

In [294]: B_sparse[0,A]
Out[294]: 
<3x9 sparse matrix of type '<class 'numpy.int32'>'
    with 24 stored elements in Compressed Sparse Row format>
In [295]: _.A
Out[295]: 
array([[2, 3, 4, 2, 8, 4, 2, 3, 4],
       [5, 6, 7, 5, 6, 7, 5, 6, 7],
       [8, 9, 0, 8, 9, 0, 8, 9, 0]], dtype=int32)

B_sparse[A,:]或B_sparse[:,A]会发出3d警告，因为它会尝试创建矩阵版本：

In [298]: B[None,:][:,A]
Out[298]: 
array([[[2, 3, 4, 2, 8, 4, 2, 3, 4],
        [5, 6, 7, 5, 6, 7, 5, 6, 7],
        [8, 9, 0, 8, 9, 0, 8, 9, 0]]])

关于你的功能：

A_sparse.nonzero()执行A_sparse.tocoo()并返回其row和col。与你所做的一样有效。

这里的内容应该更快，不过我还没有测试它以确保它是健壮的：

In [342]: Ac=A_sparse.tocoo()
In [343]: res=Ac.copy()
In [344]: res.data[:]=B_sparse[0, Ac.data].A[0]
In [345]: res
Out[345]: 
<3x9 sparse matrix of type '<class 'numpy.int32'>'
    with 27 stored elements in COOrdinate format>
In [346]: res.A
Out[346]: 
array([[2, 3, 4, 2, 8, 4, 2, 3, 4],
       [5, 6, 7, 5, 6, 7, 5, 6, 7],
       [8, 9, 0, 8, 9, 0, 8, 9, 0]], dtype=int32)

在此示例中，还有2个零也可以清理（查看res.nonzero()）。

由于您使用res[i,j]和Ac.row的值设置每个Ac.col，res的值row,col与Ac相同，因此我将其初始化为副本。然后，只需更新res.data属性即可。

如何使用SciPy CSR Sparse Arrays将一个数组索引到另一个数组？

1 个答案: