Question

我有一个稀疏矩阵X

<1000000x153047 sparse matrix of type '<class 'numpy.float64'>'
with 5082518 stored elements in Compressed Sparse Column format>

我有一个数组

columns_to_use

它由10000个矩阵X的列组成。我想只使用这些列并删除另一列。我尝试使用这样的代码：

X_new = X[:, columns_to_use]

它适用于小X（10 000行），但有10万行或更多行，我得到内存错误。如何获取没有内存错误的特定列？

Answer 1

我做出了这样的决定：

cols = []
for i in columns_to_use:
    cols.append(X[:,i])
X_new = hstack(cols)

它工作得足够快，没有任何错误。这很容易。