将pyspark.mllib.linalg.SparseVectors的列表转换为csr_matrix

时间:2018-06-11 17:21:42

标签: python python-3.x scipy pyspark sparse-matrix

truearray成为pyspark.mllib.linalg.SparseVector个对象的numpy数组:

>>> truearray

array([SparseVector(262144, {0: 1.0, 72: 1.0, 106: 1.0, 202: 1.0, 413: 1.0, 417: 1.0}),
       SparseVector(262144, {0: 1.0, 9: 1.0, 533: 1.0, 781: 1.0, 3918: 1.0}),
       SparseVector(262144, {0: 1.0, 9: 1.0, 533: 1.0, 781: 1.0, 3918: 1.0}),
       ...

长度约为250k,因此矩阵有大约600亿个元素作为密集矩阵。不,谢谢。

我想将其转换为scipy.sparse.csr_matrix。这不起作用:

>>> from scipy.sparse import csr_matrix
>>> truem = csr_matrix(truearray)

File "/opt/zeppelin-env/lib/python3.5/site-packages/scipy/sparse/sputils.py", line 51, in upcast
    raise TypeError('no supported conversion for types: %r' % (args,))
TypeError: no supported conversion for types: (dtype('O'),)

我该怎么做?

0 个答案:

没有答案