Question

我有一个带有少量元素的稀疏矩阵。现在我想对它进行规范化。但是，当我这样做时，它会被转换为numpy数组，从性能的角度来看这是不可接受的。

为了使事情更具体，请考虑以下示例：

x = csr_matrix([[0, 1, 1], [2, 3, 0]])  # sparse
normalization = x.sum(axis=1)  # dense, this is OK

x / normalization  # this is dense, not OK, can be huge

有没有一种优雅的方法可以做到这一点，而不必诉诸于循环？

修改

是的，这可以使用sklearn.preprocessing.normalize使用'l1'规范化来完成，但是，我不希望依赖于sklearn。

Answer 1

您始终可以使用>>> import numpy as np >>> from scipy import sparse >>> >>> x = sparse.csr_matrix([[0, 1, 1], [2, 3, 0]]) >>> >>> x.data = x.data / np.repeat(np.add.reduceat(x.data, x.indptr[:-1]), np.diff(x.indptr)) >>> x <2x3 sparse matrix of type '<class 'numpy.float64'>' with 4 stored elements in Compressed Sparse Row format> >>> x.A array([[0. , 0.5, 0.5], [0.4, 0.6, 0. ]])内幕：

componentDidMount

规范化稀疏行概率矩阵

1 个答案: