如何从截断的正态分布中抽取特定方差的随机数?

时间:2017-11-11 13:08:02

标签: python random

我正在尝试从截断的正态分布中对数字进行采样,给出特定的方差和结果数的界限,例如:我需要具有均值0和单位方差的数字,但它们必须在某些范围内,例如[-2,2]

我无法弄清楚如何在保持差异的同时截断分布。

import math
import numpy as np
import scipy.stats as stats


truncation = 2
lower, upper = -truncation, truncation
mu, sigma = 0, 1
num_samples = 1000
if truncation:
    n = stats.truncnorm((lower - mu) / sigma, (upper - mu) / sigma, loc=mu, scale=sigma)
    samples = n.rvs(num_samples)
    std_trunc = np.std(samples)

    n = stats.norm(loc=mu, scale=sigma)
    samples = n.rvs(num_samples)
    std_simple = np.std(samples)

print(std_trunc, std_simple, sep='\n')

# outputs 
# 0.859167285015  # I need number close to 1 here
# 1.01735583631  # like here, but here it's not truncated

1 个答案:

答案 0 :(得分:2)

维基百科页面为observed mean and variance提供了表达式,我们可以使用它来反转,找出我们应该传递给truncnorm的值,以便为我们提供我们想要的结果。

我们不会利用基于标准法线的任何简化,部分是为了一般而且部分是因为我还没有吃过早餐所以我不想做任何算术..可能你可以用一个简单的计算来代替整个最小化。

import numpy as np
import scipy.stats as stats
import scipy.optimize

def truncated_mean_std(mu, sigma, lower, upper):
    # N.B. lower/upper are the actual values, not Z-scaled
    alpha = (lower - mu)/sigma
    beta = (upper - mu)/sigma
    d_pdf = (stats.norm.pdf(alpha) - stats.norm.pdf(beta))
    wd_pdf = (alpha * stats.norm.pdf(alpha) - beta * stats.norm.pdf(beta))
    d_cdf = stats.norm.cdf(beta) - stats.norm.cdf(alpha)
    mu_trunc = mu + sigma * (d_pdf / d_cdf)
    var_trunc = sigma**2 * (1 + wd_pdf / d_cdf - (d_pdf/d_cdf)**2)
    std_trunc = var_trunc**0.5
    return mu_trunc, std_trunc

def trunc_samples(mu, sigma, lower, upper, num_samples=1000):
    n = stats.truncnorm((lower - mu) / sigma, (upper - mu) / sigma, loc=mu, scale=sigma)
    samples = n.rvs(num_samples)
    return samples

def corrector(mu, sigma, lower, upper):
    target = np.array([mu, sigma])
    result = scipy.optimize.minimize(
        lambda x: ((target - truncated_mean_std(x[0], x[1], lower, upper))**2).sum(),
        x0=[mu, sigma])
    return result.x

给了我:

In [79]: s = trunc_samples(mu=0, sigma=1, lower=-2, upper=2, num_samples=10**7)

In [80]: s.mean(), s.std()
Out[80]: (-9.8821067931585576e-05, 0.87951241887015619)

In [81]: mu_to_use, sigma_to_use = corrector(0, 1, -2, 2)

In [82]: mu_to_use, sigma_to_use
Out[82]: (-7.4553057719882245e-09, 1.3778928137492246)

In [83]: s = trunc_samples(mu=mu_to_use, sigma=sigma_to_use, lower=-2, upper=2, num_samples=10**7)

In [84]: s.mean(), s.std()
Out[84]: (0.0004091647648333381, 0.99991490259048865)

In [85]: s.min(), s.max()
Out[85]: (-1.9999995310631815, 1.9999997070340947)