在python中计算标准法线时的scipy stats错误

时间:2018-02-01 08:26:25

标签: python numpy scipy

我试图找出python中我的数据df的正态分布下的概率。我没有使用python或编程经验。我从这个站点抓取的以下用户定义函数有效,scipy函数不起作用......

UDF:

def normal(x,mu,sigma):
    return ( 2.*np.pi*sigma**2. )**-.5 * np.exp( -.5 * (x-mu)**2. / sigma**2. )
df["normprob"] = normal(df["return"],df["meanreturn"],df["sdreturn"])

这个scipy函数不起作用:

df["normdistprob"] = scip.norm.sf(df["return"],df["meanreturn"],df["sdreturn"])

并返回以下错误

C:\Anaconda3\lib\site-packages\scipy\stats\_distn_infrastructure.py:1815: RuntimeWarning: invalid value encountered in true_divide
  x = np.asarray((x - loc)/scale, dtype=dtyp)
C:\Anaconda3\lib\site-packages\scipy\stats\_distn_infrastructure.py:1816: RuntimeWarning: invalid value encountered in greater
  cond0 = self._argcheck(*args) & (scale > 0)
C:\Anaconda3\lib\site-packages\scipy\stats\_distn_infrastructure.py:879: RuntimeWarning: invalid value encountered in greater
  return (self.a < x) & (x < self.b)
C:\Anaconda3\lib\site-packages\scipy\stats\_distn_infrastructure.py:879: RuntimeWarning: invalid value encountered in less
  return (self.a < x) & (x < self.b)
C:\Anaconda3\lib\site-packages\scipy\stats\_distn_infrastructure.py:1817: RuntimeWarning: invalid value encountered in greater
  cond1 = self._open_support_mask(x) & (scale > 0)
C:\Anaconda3\lib\site-packages\scipy\stats\_distn_infrastructure.py:1818: RuntimeWarning: invalid value encountered in less_equal
  cond2 = cond0 & (x <= self.a)

任何建议表示赞赏。还要注意,前20个细胞

df["meanreturn"]

是NA,不确定是否会影响它。

1 个答案:

答案 0 :(得分:0)

不确定生存功能是否符合您的需要。我相信你所寻找的是scipy的pdf函数,特别是普通随机变量的pdf。我根据您使用的自定义功能对其进行了测试。

>>> from scipy.stats import norm
>>> import numpy as np
>>> import pandas as pd
>>> df = pd.DataFrame({'x': [0.6, 0.5, 0.13], 'mu': [0, 1, 1], 'std': [1, 2, 1]})
>>> norm.pdf(df['x'], df['mu'], df['std'])
array([ 0.3332246 ,  0.19333406,  0.27324443])
>>> def normal(x,mu,sigma):
...     return ( 2.*np.pi*sigma**2. )**-.5 * np.exp( -.5 * (x-mu)**2. / sigma**2. )
...
>>> normal(df['x'], df['mu'], df['std'])
0    0.333225
1    0.193334
2    0.273244
dtype: float64

请注意,如果您的mustd列为np.nan,那么您将收到运行时警告,但仍会获得输出,类似于自定义函数。

>>> df = pd.DataFrame({'x': [0.6, 0.5, 0.13], 'mu': [np.nan, 1, 1], 'std': [np.nan, 2, np.nan]})
>>> norm.pdf(df['x'], df['mu'], df['std'])
C:\Users\lyang3\AppData\Local\Continuum\anaconda3\lib\site-packages\scipy\stats\_distn_infrastructure.py:1650: RuntimeWarning: invalid value encountered in greater
  cond0 = self._argcheck(*args) & (scale > 0)
C:\Users\lyang3\AppData\Local\Continuum\anaconda3\lib\site-packages\scipy\stats\_distn_infrastructure.py:876: RuntimeWarning: invalid value encountered in greater_equal
  return (self.a <= x) & (x <= self.b)
C:\Users\lyang3\AppData\Local\Continuum\anaconda3\lib\site-packages\scipy\stats\_distn_infrastructure.py:876: RuntimeWarning: invalid value encountered in less_equal
  return (self.a <= x) & (x <= self.b)
C:\Users\lyang3\AppData\Local\Continuum\anaconda3\lib\site-packages\scipy\stats\_distn_infrastructure.py:1651: RuntimeWarning: invalid value encountered in greater
  cond1 = self._support_mask(x) & (scale > 0)
array([        nan,  0.19333406,         nan])
>>> normal(df['x'], df['mu'], df['std'])
0         NaN
1    0.193334
2         NaN
dtype: float64

如果您将np.nan值设置为None,则可以避免出现警告:

>>> df = pd.DataFrame({'x': [0.6, 0.5, 0.13], 'mu': [None, 1, 1], 'std': [None, 2, None]})
>>> normal(df['x'], df['mu'], df['std'])
0         NaN
1    0.193334
2         NaN
dtype: float64
>>> norm.pdf(df['x'], df['mu'], df['std'])
array([        nan,  0.19333406,         nan])

注意,我会删除meanreturnsdreturn值为NaN的行。否则,我会假设您正在寻找x假设标准正态分布的概率,然后您必须将NaN的{​​{1}}值设置为0和{ {1}} meanreturn到1的值。

要添加的最后一条评论是,如果数据框的所有行都采用标准正态分布来计算pdf的概率,那么您就不需要传递NaN列和{{ 1}}列。 sdreturn已经假设标准正常。在这种情况下,您可以像这样运行代码:

mu