应用错误收集

我有一个Spark数据框：

df.show()

+--------+--------+------------+
|     i  |     j  |     value
+--------+--------+------------+
|     0.0|     0.0|      -516.0|
|     0.0|     2.0| 0.771516749|

df有1M行。矩阵非常稀疏，因为我有〜100K的 i 和〜100K的 j 。因此，平均i等于10。

我在计算SVD时出错：

cmat = CoordinateMatrix(df.rdd)
svd = cmat.computeSVD(100)

Out >>

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-17-a82b41a6869f> in <module>()
----> 1 svd = cmat.computeSVD(100)

AttributeError: 'CoordinateMatrix' object has no attribute 'computeSVD'

所以我试图转换为RowMatrix：

rowmat = cmat.toRowMatrix()

但是集群上花了200Go，这不是很好。

如何在Spark（Python）中计算大型稀疏矩阵的特征值？

计算大稀疏矩阵的特征向量？

0 个答案: