我创建了一个截断的指数分布:
from scipy.stats import truncexpon
truncexp = truncexpon(b = 8)
现在,我想从此分布中采样8个点,以使其均值约为4。 最好的方法是什么,而不会造成巨大的循环来随机采样直到均值足够接近?
答案 0 :(得分:0)
平均值是您分布的特征。如果继续采样值,经验均值将越来越接近分析均值。
Scipy可以告诉您截断指数的平均值:
b = 8
truncexp = truncexpon(b)
truncexp.mean() # 0.99731539839326999
您可以使用分布来采样并计算经验均值:
num_samples = 100000
np.mean(truncexp.rvs(num_samples)) # 0.99465816346645264
一个计算公式的平均值是(第二行):
b = np.linspace(0.1, 20, 100)
m = 1/ ((1 - np.exp(-b)) / ((1 - (b + 1)*np.exp(-b))))
如果对此进行绘制,则可以看到平均值对不同b值的表现。
对于b-> inf,均值将接近1。您将找不到均值为4的b。
如果要从平均值为4的截断指数中采样,则可以简单地缩放采样。这不会给您原始分布的样本,但是再次,原始分布的样本将永远不会给您平均值4。
truncexp.rvs(num_samples) * 4 / truncexp.mean()
答案 1 :(得分:0)
truncexpon
分布具有三个参数:形状b
,位置loc
和比例尺scale
。发行版的支持为[x1, x2]
,其中x1 = loc
和x2 = shape*scale + loc
。对shape
求解后一个方程,得到shape = (x2 - x1)/scale
。我们将选择scale
参数,以使分布的均值为4。为此,我们可以将scipy.optimize.fsolve
应用于当truncexpon.mean((x2 - x1)/scale, loc, scale)
为4时标度为零的函数
这是一个简短的脚本来演示:
import numpy as np
from scipy.optimize import fsolve
from scipy.stats import truncexpon
def func(scale, desired_mean, x1, x2):
return truncexpon.mean((x2 - x1)/scale, loc=x1, scale=scale) - desired_mean
x1 = 1
x2 = 9
desired_mean = 4.0
# Numerically solve for the scale parameter of the truncexpon distribution
# with support [x1, x2] for which the expected mean is desired_mean.
scale_guess = 2.0
scale = fsolve(func, scale_guess, args=(desired_mean, x1, x2))[0]
# This is the shape parameter of the desired truncexpon distribution.
shape = (x2 - x1)/scale
print("Expected mean of the distribution is %6.3f" %
(truncexpon.mean(shape, loc=x1, scale=scale),))
print("Expected standard deviation of the distribution is %6.3f" %
(truncexpon.std(shape, loc=x1, scale=scale),))
# Generate a sample of size 8, and compute its mean.
sample = truncexpon.rvs(shape, loc=x1, scale=scale, size=8)
print("Mean of the sample of size %d is %6.3f" %
(len(sample), sample.mean(),))
bigsample = truncexpon.rvs(shape, loc=x1, scale=scale, size=100000)
print("Mean of the sample of size %d is %6.3f" %
(len(bigsample), bigsample.mean(),))
典型输出:
Expected mean of the distribution is 4.000
Expected standard deviation of the distribution is 2.178
Mean of the sample of size 8 is 4.694
Mean of the sample of size 100000 is 4.002