scikit-learn biclustering错误:数组不得包含infs或NaN

时间:2015-12-04 20:36:54

标签: python matrix scikit-learn cluster-analysis

我有一个24866×13矩阵的零和零,并希望从中发现双聚类。我尝试了sci-kit学习spectral co-clusteringspectral biclustering,但他们都返回错误" ValueError:数组不能包含infs或NaNs。"

矩阵存储为NumPy数组,我确认它确实只包含一个或零,并且没有infs或NaN。频谱共聚类的错误消息是:

>>> RNAiDf = pd.read_table(dfFile, index_col=0)
>>> RNAiDf.head()
       HBEC30  H1155  HCC366  H1819  HCC44  HCC4017  H1993  H460  H2073  \
22848       1      0       0      0      0        1      0     0      0   
9625        0      0       0      0      0        0      0     0      0   
25          0      0       1      0      0        0      0     0      0   
27          0      0       0      0      0        0      0     0      0   
10188       0      0       1      0      0        0      0     0      1   

       H2009  H2122  H1395  HCC95  
22848      0      1      0      0  
9625       0      1      0      0  
25         0      0      0      1  
27         0      0      0      0  
10188      1      0      0      0  
>>> RNAiMatrix = RNAiDf.values
>>> RNAiMatrix.shape
(24866, 13)
>>> model = bicluster.SpectralCoclustering()
>>> model.fit(RNAiMatrix)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/work/anaconda3/lib/python3.4/site-packages/sklearn/cluster/bicluster.py", line 122, in fit
    self._fit(X)
  File "/work/anaconda3/lib/python3.4/site-packages/sklearn/cluster/bicluster.py", line 271, in _fit
    u, v = self._svd(normalized_data, n_sv, n_discard=1)
  File "/work/anaconda3/lib/python3.4/site-packages/sklearn/cluster/bicluster.py", line 135, in _svd
    **kwargs)
  File "/work/anaconda3/lib/python3.4/site-packages/sklearn/utils/extmath.py", line 296, in randomized_svd
    Q = randomized_range_finder(M, n_random, n_iter, random_state)
  File "/work/anaconda3/lib/python3.4/site-packages/sklearn/utils/extmath.py", line 229, in randomized_range_finder
    Q, R = linalg.qr(Y, mode='economic')
  File "/work/anaconda3/lib/python3.4/site-packages/scipy/linalg/decomp_qr.py", line 127, in qr
    a1 = numpy.asarray_chkfinite(a)
  File "/work/anaconda3/lib/python3.4/site-packages/numpy/lib/function_base.py", line 668, in asarray_chkfinite
    "array must not contain infs or NaNs")
ValueError: array must not contain infs or NaNs

和光谱双聚集:

>>> model = bicluster.SpectralBiclustering()
>>> model.fit(RNAiMatrix)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/work/anaconda3/lib/python3.4/site-packages/sklearn/cluster/bicluster.py", line 122, in fit
    self._fit(X)
  File "/work/anaconda3/lib/python3.4/site-packages/sklearn/cluster/bicluster.py", line 440, in _fit
    u, v = self._svd(normalized_data, n_sv, n_discard)
  File "/work/anaconda3/lib/python3.4/site-packages/sklearn/cluster/bicluster.py", line 135, in _svd
    **kwargs)
  File "/work/anaconda3/lib/python3.4/site-packages/sklearn/utils/extmath.py", line 296, in randomized_svd
    Q = randomized_range_finder(M, n_random, n_iter, random_state)
  File "/work/anaconda3/lib/python3.4/site-packages/sklearn/utils/extmath.py", line 229, in randomized_range_finder
    Q, R = linalg.qr(Y, mode='economic')
  File "/work/anaconda3/lib/python3.4/site-packages/scipy/linalg/decomp_qr.py", line 127, in qr
    a1 = numpy.asarray_chkfinite(a)
  File "/work/anaconda3/lib/python3.4/site-packages/numpy/lib/function_base.py", line 668, in asarray_chkfinite
    "array must not contain infs or NaNs")
ValueError: array must not contain infs or NaNs

错误似乎源于计算某些矩阵的QR分解。用于设置双聚类数量的参数似乎没有任何区别。我使用scikit-learn版本0.16.1,并且没有列是常量。什么可能出错?提前致谢。

1 个答案:

答案 0 :(得分:0)

RNAiMatrix应该是一个亲和力矩阵,“0”表示两个元素是相同的,我认为你最好修改这个矩阵,你可以遵循这个 Using the class sklearn.cluster.SpectralClustering with parameter affinity='precomputed'