Python Scipy:scipy.stats.spearmanr返回nans

时间:2015-08-20 10:34:02

标签: python scipy correlation

编辑:我认为基本上已经解决了。

我正在使用scipy.stats中的spearmanr来查找多个不同样本之间的变量之间的相关性。我有大约2500个变量和36个样本(或“观察”)

如果我使用所有36个样本计算相关性,spearmanr工作正常。如果我只使用前18个样品,它也可以正常工作。但是,如果我使用后18个样本,我会收到错误并返回nans。

这是错误:

/Home/s1215235/.local/lib/python2.7/site-packages/numpy/lib/function_base.py:1945: RuntimeWarning: invalid value encountered in true_divide
return c / sqrt(multiply.outer(d, d))
/Home/s1215235/.local/lib/python2.7/site-packages/scipy/stats/_distn_infrastructure.py:1718: RuntimeWarning: invalid value encountered in greater
cond1 = (scale > 0) & (x > self.a) & (x < self.b)
/Home/s1215235/.local/lib/python2.7/site-packages/scipy/stats/_distn_infrastructure.py:1718: RuntimeWarning: invalid value encountered in less
cond1 = (scale > 0) & (x > self.a) & (x < self.b)
/Home/s1215235/.local/lib/python2.7/site-packages/scipy/stats/_distn_infrastructure.py:1719: RuntimeWarning: invalid value encountered in less_equal
cond2 = cond0 & (x <= self.a)

这是代码:

populationdata = np.vstack(thing).astype(np.float)
rho, pval = stats.spearmanr(populationdata[:,sampleindexes], axis = 1)

(populationdata是一个装满浮点数的numpy数组; [:,sampleindexes]只允许使用几个列。

这就是rho的回复:

[[ 1.                 nan         nan ...,  1.         -0.05882353
  -0.08574929]
 [        nan         nan         nan ...,         nan         nan
          nan]
 [        nan         nan         nan ...,         nan         nan
          nan]
 ..., 
 [ 1.                 nan         nan ...,  1.         -0.05882353
  -0.08574929]
 [-0.05882353         nan         nan ..., -0.05882353  1.          0.68599434]
 [-0.08574929         nan         nan ..., -0.08574929  0.68599434  1.        ]]

1 个答案:

答案 0 :(得分:4)

在评论中注意到“虽然有很多0。”所以populationdata[:,sampleindexes]可能有全部为0的行。这将导致spearmanr到生成nan。例如,

In [3]: spearmanr([[0, 0, 0], [1, 2, 3]], axis=1)
/Users/warren/anaconda/lib/python2.7/site-packages/numpy/lib/function_base.py:1957: RuntimeWarning: invalid value encountered in true_divide
  return c / sqrt(multiply.outer(d, d))
/Users/warren/anaconda/lib/python2.7/site-packages/scipy/stats/_distn_infrastructure.py:1728: RuntimeWarning: invalid value encountered in greater
  cond1 = (scale > 0) & (x > self.a) & (x < self.b)
/Users/warren/anaconda/lib/python2.7/site-packages/scipy/stats/_distn_infrastructure.py:1728: RuntimeWarning: invalid value encountered in less
  cond1 = (scale > 0) & (x > self.a) & (x < self.b)
/Users/warren/anaconda/lib/python2.7/site-packages/scipy/stats/_distn_infrastructure.py:1729: RuntimeWarning: invalid value encountered in less_equal
  cond2 = cond0 & (x <= self.a)
Out[3]: (nan, nan)