我试图找出python中我的数据df
的正态分布下的概率。我没有使用python或编程经验。我从这个站点抓取的以下用户定义函数有效,scipy函数不起作用......
UDF:
def normal(x,mu,sigma):
return ( 2.*np.pi*sigma**2. )**-.5 * np.exp( -.5 * (x-mu)**2. / sigma**2. )
df["normprob"] = normal(df["return"],df["meanreturn"],df["sdreturn"])
这个scipy函数不起作用:
df["normdistprob"] = scip.norm.sf(df["return"],df["meanreturn"],df["sdreturn"])
并返回以下错误
C:\Anaconda3\lib\site-packages\scipy\stats\_distn_infrastructure.py:1815: RuntimeWarning: invalid value encountered in true_divide
x = np.asarray((x - loc)/scale, dtype=dtyp)
C:\Anaconda3\lib\site-packages\scipy\stats\_distn_infrastructure.py:1816: RuntimeWarning: invalid value encountered in greater
cond0 = self._argcheck(*args) & (scale > 0)
C:\Anaconda3\lib\site-packages\scipy\stats\_distn_infrastructure.py:879: RuntimeWarning: invalid value encountered in greater
return (self.a < x) & (x < self.b)
C:\Anaconda3\lib\site-packages\scipy\stats\_distn_infrastructure.py:879: RuntimeWarning: invalid value encountered in less
return (self.a < x) & (x < self.b)
C:\Anaconda3\lib\site-packages\scipy\stats\_distn_infrastructure.py:1817: RuntimeWarning: invalid value encountered in greater
cond1 = self._open_support_mask(x) & (scale > 0)
C:\Anaconda3\lib\site-packages\scipy\stats\_distn_infrastructure.py:1818: RuntimeWarning: invalid value encountered in less_equal
cond2 = cond0 & (x <= self.a)
任何建议表示赞赏。还要注意,前20个细胞
df["meanreturn"]
是NA,不确定是否会影响它。
答案 0 :(得分:0)
不确定生存功能是否符合您的需要。我相信你所寻找的是scipy的pdf
函数,特别是普通随机变量的pdf。我根据您使用的自定义功能对其进行了测试。
>>> from scipy.stats import norm
>>> import numpy as np
>>> import pandas as pd
>>> df = pd.DataFrame({'x': [0.6, 0.5, 0.13], 'mu': [0, 1, 1], 'std': [1, 2, 1]})
>>> norm.pdf(df['x'], df['mu'], df['std'])
array([ 0.3332246 , 0.19333406, 0.27324443])
>>> def normal(x,mu,sigma):
... return ( 2.*np.pi*sigma**2. )**-.5 * np.exp( -.5 * (x-mu)**2. / sigma**2. )
...
>>> normal(df['x'], df['mu'], df['std'])
0 0.333225
1 0.193334
2 0.273244
dtype: float64
请注意,如果您的mu
和std
列为np.nan
,那么您将收到运行时警告,但仍会获得输出,类似于自定义函数。
>>> df = pd.DataFrame({'x': [0.6, 0.5, 0.13], 'mu': [np.nan, 1, 1], 'std': [np.nan, 2, np.nan]})
>>> norm.pdf(df['x'], df['mu'], df['std'])
C:\Users\lyang3\AppData\Local\Continuum\anaconda3\lib\site-packages\scipy\stats\_distn_infrastructure.py:1650: RuntimeWarning: invalid value encountered in greater
cond0 = self._argcheck(*args) & (scale > 0)
C:\Users\lyang3\AppData\Local\Continuum\anaconda3\lib\site-packages\scipy\stats\_distn_infrastructure.py:876: RuntimeWarning: invalid value encountered in greater_equal
return (self.a <= x) & (x <= self.b)
C:\Users\lyang3\AppData\Local\Continuum\anaconda3\lib\site-packages\scipy\stats\_distn_infrastructure.py:876: RuntimeWarning: invalid value encountered in less_equal
return (self.a <= x) & (x <= self.b)
C:\Users\lyang3\AppData\Local\Continuum\anaconda3\lib\site-packages\scipy\stats\_distn_infrastructure.py:1651: RuntimeWarning: invalid value encountered in greater
cond1 = self._support_mask(x) & (scale > 0)
array([ nan, 0.19333406, nan])
>>> normal(df['x'], df['mu'], df['std'])
0 NaN
1 0.193334
2 NaN
dtype: float64
如果您将np.nan
值设置为None
,则可以避免出现警告:
>>> df = pd.DataFrame({'x': [0.6, 0.5, 0.13], 'mu': [None, 1, 1], 'std': [None, 2, None]})
>>> normal(df['x'], df['mu'], df['std'])
0 NaN
1 0.193334
2 NaN
dtype: float64
>>> norm.pdf(df['x'], df['mu'], df['std'])
array([ nan, 0.19333406, nan])
注意,我会删除meanreturn
和sdreturn
值为NaN
的行。否则,我会假设您正在寻找x
假设标准正态分布的概率,然后您必须将NaN
的{{1}}值设置为0和{ {1}} meanreturn
到1的值。
要添加的最后一条评论是,如果数据框的所有行都采用标准正态分布来计算pdf的概率,那么您就不需要传递NaN
列和{{ 1}}列。 sdreturn
已经假设标准正常。在这种情况下,您可以像这样运行代码:
mu