我有一个24866×13矩阵的零和零,并希望从中发现双聚类。我尝试了sci-kit学习spectral co-clustering和spectral biclustering,但他们都返回错误" ValueError:数组不能包含infs或NaNs。"
矩阵存储为NumPy数组,我确认它确实只包含一个或零,并且没有infs或NaN。频谱共聚类的错误消息是:
>>> RNAiDf = pd.read_table(dfFile, index_col=0)
>>> RNAiDf.head()
HBEC30 H1155 HCC366 H1819 HCC44 HCC4017 H1993 H460 H2073 \
22848 1 0 0 0 0 1 0 0 0
9625 0 0 0 0 0 0 0 0 0
25 0 0 1 0 0 0 0 0 0
27 0 0 0 0 0 0 0 0 0
10188 0 0 1 0 0 0 0 0 1
H2009 H2122 H1395 HCC95
22848 0 1 0 0
9625 0 1 0 0
25 0 0 0 1
27 0 0 0 0
10188 1 0 0 0
>>> RNAiMatrix = RNAiDf.values
>>> RNAiMatrix.shape
(24866, 13)
>>> model = bicluster.SpectralCoclustering()
>>> model.fit(RNAiMatrix)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/work/anaconda3/lib/python3.4/site-packages/sklearn/cluster/bicluster.py", line 122, in fit
self._fit(X)
File "/work/anaconda3/lib/python3.4/site-packages/sklearn/cluster/bicluster.py", line 271, in _fit
u, v = self._svd(normalized_data, n_sv, n_discard=1)
File "/work/anaconda3/lib/python3.4/site-packages/sklearn/cluster/bicluster.py", line 135, in _svd
**kwargs)
File "/work/anaconda3/lib/python3.4/site-packages/sklearn/utils/extmath.py", line 296, in randomized_svd
Q = randomized_range_finder(M, n_random, n_iter, random_state)
File "/work/anaconda3/lib/python3.4/site-packages/sklearn/utils/extmath.py", line 229, in randomized_range_finder
Q, R = linalg.qr(Y, mode='economic')
File "/work/anaconda3/lib/python3.4/site-packages/scipy/linalg/decomp_qr.py", line 127, in qr
a1 = numpy.asarray_chkfinite(a)
File "/work/anaconda3/lib/python3.4/site-packages/numpy/lib/function_base.py", line 668, in asarray_chkfinite
"array must not contain infs or NaNs")
ValueError: array must not contain infs or NaNs
和光谱双聚集:
>>> model = bicluster.SpectralBiclustering()
>>> model.fit(RNAiMatrix)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/work/anaconda3/lib/python3.4/site-packages/sklearn/cluster/bicluster.py", line 122, in fit
self._fit(X)
File "/work/anaconda3/lib/python3.4/site-packages/sklearn/cluster/bicluster.py", line 440, in _fit
u, v = self._svd(normalized_data, n_sv, n_discard)
File "/work/anaconda3/lib/python3.4/site-packages/sklearn/cluster/bicluster.py", line 135, in _svd
**kwargs)
File "/work/anaconda3/lib/python3.4/site-packages/sklearn/utils/extmath.py", line 296, in randomized_svd
Q = randomized_range_finder(M, n_random, n_iter, random_state)
File "/work/anaconda3/lib/python3.4/site-packages/sklearn/utils/extmath.py", line 229, in randomized_range_finder
Q, R = linalg.qr(Y, mode='economic')
File "/work/anaconda3/lib/python3.4/site-packages/scipy/linalg/decomp_qr.py", line 127, in qr
a1 = numpy.asarray_chkfinite(a)
File "/work/anaconda3/lib/python3.4/site-packages/numpy/lib/function_base.py", line 668, in asarray_chkfinite
"array must not contain infs or NaNs")
ValueError: array must not contain infs or NaNs
错误似乎源于计算某些矩阵的QR分解。用于设置双聚类数量的参数似乎没有任何区别。我使用scikit-learn版本0.16.1,并且没有列是常量。什么可能出错?提前致谢。
答案 0 :(得分:0)
RNAiMatrix应该是一个亲和力矩阵,“0”表示两个元素是相同的,我认为你最好修改这个矩阵,你可以遵循这个 Using the class sklearn.cluster.SpectralClustering with parameter affinity='precomputed'