让truearray
成为pyspark.mllib.linalg.SparseVector个对象的numpy数组:
>>> truearray
array([SparseVector(262144, {0: 1.0, 72: 1.0, 106: 1.0, 202: 1.0, 413: 1.0, 417: 1.0}),
SparseVector(262144, {0: 1.0, 9: 1.0, 533: 1.0, 781: 1.0, 3918: 1.0}),
SparseVector(262144, {0: 1.0, 9: 1.0, 533: 1.0, 781: 1.0, 3918: 1.0}),
...
长度约为250k,因此矩阵有大约600亿个元素作为密集矩阵。不,谢谢。
我想将其转换为scipy.sparse.csr_matrix。这不起作用:
>>> from scipy.sparse import csr_matrix
>>> truem = csr_matrix(truearray)
File "/opt/zeppelin-env/lib/python3.5/site-packages/scipy/sparse/sputils.py", line 51, in upcast
raise TypeError('no supported conversion for types: %r' % (args,))
TypeError: no supported conversion for types: (dtype('O'),)
我该怎么做?