我正在使用scikit learn的函数'Spectral clustering'。我能够通过8100矩阵对8100执行聚类,但是这个函数会抛出10000乘10000矩阵的错误。
是否有人将此功能用于大型矩阵?
编辑:我收到以下错误消息:
Not enough memory to perform factorization.
Traceback (most recent call last):
File "combined_code_img.py", line 287, in <module>
labels=spectral.fit_predict(Affinity)
File "/root/anaconda/lib/python2.7/site-packages/sklearn/base.py",
line 410, in fit_predict
self.fit(X)
File "/root/anaconda/lib/python2.7/site-packages/sklearn/cluster/spectral.py", line 463, in fit
assign_labels=self.assign_labels)
File "/root/anaconda/lib/python2.7/site-packages/sklearn/cluster/spectral.py", line 258, in spectral_clustering
eigen_tol=eigen_tol, drop_first=False)
File "/root/anaconda/lib/python2.7/site-packages/sklearn/manifold/spectral_embedding_.py", line 265, in spectral_embedding
tol=eigen_tol, v0=v0)
File "/root/anaconda/lib/python2.7/site-packages/scipy/sparse/linalg/eigen/arpack/arpack.py", line 1560, in eigsh
symmetric=True, tol=tol)
File "/root/anaconda/lib/python2.7/site-packages/scipy/sparse/linalg/eigen/arpack/arpack.py", line 1046, in get_OPinv_matvec
return SpLuInv(A.tocsc()).matvec
File "/root/anaconda/lib/python2.7/site-packages/scipy/sparse/linalg/eigen/arpack/arpack.py", line 907, in __init__
self.M_lu = splu(M)
File "/root/anaconda/lib/python2.7/site-packages/scipy/sparse/linalg/dsolve/linsolve.py", line 261, in splu
ilu=False, options=_options)
MemoryError
我的机器有16 GB的RAM。
答案 0 :(得分:1)
频谱聚类算法具有~O(n³)的时间复杂度和相当差的空间复杂度,因为您正在耗尽16 GB RAM的内存来处理~0.8 GB的数据集(10000x10000数组,假设64位浮点数)。因此,它不适用于大型数据集。
相反,您应该使用可以更好地扩展的聚类算法。请参阅以下HDBSCAN documentation,
中的基准测试例如,来自scikit-learn或HDBSCAN的MiniBatchKMeans,DBSCAN可以更好地扩展。