我试图产生一个随机分布,我控制平均值,SD,偏度和峰度。
在生成分布后,我可以用一些简单的数学解决均值和SD。
Kurtosis我暂时搁在架子上,因为它看起来太难了。
偏见是今天的问题。
import scipy.stats
def convert_to_alpha(s):
d=(np.pi/2*((abs(s)**(2/3))/(abs(s)**(2/3)+((4-np.pi)/2)**(2/3))))**0.5
a=((d)/((1-d**2)**.5))
return(a)
for skewness_expected in (.5, .9, 1.3):
alpha = convert_to_alpha(skewness_expected)
r = stats.skewnorm.rvs(alpha,size=10000)
print('Skewness expected:',skewness_expected)
print('Skewness obtained:',stats.skew(r))
print()
Skewness expected: 0.5
Skewness obtained: 0.47851348006629035
Skewness expected: 0.9
Skewness obtained: 0.8917020428586827
Skewness expected: 1.3
Skewness obtained: (1.2794406116842627+0.01780402125888404j)
我知道计算出的偏度通常与所需的偏度不匹配 - 毕竟这是随机分布。但我很困惑,我怎么能得到一个偏斜的分布> 1没有落入复数领域。 rvs方法似乎无法处理它,因为每当偏度> 1时,参数α是虚数。 1.
如何修复它以便生成具有偏斜度的分布> 1,但没有复杂的数字悄然进入?
[感谢Warren Weckesser指向维基百科以编写convert_to_alpha函数。]
答案 0 :(得分:3)
了解这个线程已经有一年半了,但是最近我也遇到了这个问题,在这里似乎从未得到解答。在来自stats.skewnorm的alpha和偏度统计量之间进行转换的另一个问题(顺便说一句,出色的功能可以做到这一点)是这样做也会改变对分布的集中趋势的度量,这对我的需求是有问题的。
我已经基于F分布(https://en.wikipedia.org/wiki/F-distribution)开发了这个。大量工作的最终结果是此函数,您可以为此函数指定平均值,所需的SD和偏斜度以及所需的样本量。如果有人愿意,我可以分享它背后的工作。在极端设置下,输出SD和偏斜会变得有些粗糙。大概是因为F分布自然位于1左右。对于偏斜值接近零也很成问题,在这种情况下,无论如何都不需要此函数。
from scipy import stats
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
def createSkewDist(mean, sd, skew, size):
# calculate the degrees of freedom 1 required to obtain the specific skewness statistic, derived from simulations
loglog_slope=-2.211897875506251
loglog_intercept=1.002555437670879
df2=500
df1 = 10**(loglog_slope*np.log10(abs(skew)) + loglog_intercept)
# sample from F distribution
fsample = np.sort(stats.f(df1, df2).rvs(size=size))
# adjust the variance by scaling the distance from each point to the distribution mean by a constant, derived from simulations
k1_slope = 0.5670830069364579
k1_intercept = -0.09239985798819927
k2_slope = 0.5823114978219056
k2_intercept = -0.11748300123471256
scaling_slope = abs(skew)*k1_slope + k1_intercept
scaling_intercept = abs(skew)*k2_slope + k2_intercept
scale_factor = (sd - scaling_intercept)/scaling_slope
new_dist = (fsample - np.mean(fsample))*scale_factor + fsample
# flip the distribution if specified skew is negative
if skew < 0:
new_dist = np.mean(new_dist) - new_dist
# adjust the distribution mean to the specified value
final_dist = new_dist + (mean - np.mean(new_dist))
return final_dist
'''EXAMPLE'''
desired_mean = 497.68
desired_skew = -1.75
desired_sd = 77.24
final_dist = createSkewDist(mean=desired_mean, sd=desired_sd, skew=desired_skew, size=1000000)
# inspect the plots & moments, try random sample
fig, ax = plt.subplots(figsize=(12,7))
sns.distplot(final_dist, hist=True, ax=ax, color='green', label='generated distribution')
sns.distplot(np.random.choice(final_dist, size=100), hist=True, ax=ax, color='red', hist_kws={'alpha':.2}, label='sample n=100')
ax.legend()
print('Input mean: ', desired_mean)
print('Result mean: ', np.mean(final_dist),'\n')
print('Input SD: ', desired_sd)
print('Result SD: ', np.std(final_dist),'\n')
print('Input skew: ', desired_skew)
print('Result skew: ', stats.skew(final_dist))
输入平均值:497.68
结果平均值:497.6799999999999
输入SD:77.24
结果SD:71.69030764848961
输入偏斜:-1.75
结果偏斜:-1.6724486459469905
答案 1 :(得分:0)
偏斜正态分布的形状参数不是分布的偏度。查看wikipedia page for the skew normal distribution。右边表格中的公式根据参数给出了均值,方差,偏度等的表达式。您可以使用skewnorm
方法从stats()
对象获取这些值。
例如,这里是形状参数2的分布偏度:
In [46]: from scipy.stats import skewnorm, skew
In [47]: skewnorm.stats(2, moments='s')
Out[47]: array(0.45382556395938217)
生成几个样本并找到样本偏斜:
In [48]: r = skewnorm.rvs(2, size=10000000)
In [49]: skew(r)
Out[49]: 0.4533209955299838
In [50]: r = skewnorm.rvs(2, size=10000000)
In [51]: skew(r)
Out[51]: 0.4536583726840712