我试图以快速有效的方式创建一个基于1xN矩阵的矩阵,以后用作scikit-learn训练中的特征。到目前为止,我一直尝试的很多事情之一是:
np.matrix(list(func(text) for text in data_test.data))
创建一个矩阵矩阵,如下所示:
matrix([[ <1x188796 sparse matrix of type '<type 'numpy.float64'>'
with 10921 stored elements in Compressed Sparse Row format>,
<1x188796 sparse matrix of type '<type 'numpy.float64'>'
with 17651 stored elements in Compressed Sparse Row format>,
<1x188796 sparse matrix of type '<type 'numpy.float64'>'
with 28180 stored elements in Compressed Sparse Row format>,...
显然,这并不是我真正想要的。我怎样才能把它变成一个更合适的矩阵,如下:
<76002x108800 sparse matrix of type '<type 'numpy.float64'>'
with 807960 stored elements in Compressed Sparse Row format>
答案 0 :(得分:2)
http://docs.scipy.org/doc/scipy-dev/reference/generated/scipy.sparse.vstack.html
怎么样?如果速度太慢,请从此处采取快速路径:https://github.com/scipy/scipy/blob/master/scipy/sparse/construct.py#L396(在未来的Scipy版本中,vstack
本身在这种情况下会很快。)