random
模块(http://docs.python.org/2/library/random.html)有几个固定的函数可以随机抽样。例如,random.gauss
将使用给定的均值和西格玛值对正态分布中的随机点进行采样。
我正在寻找一种在N
中使用我自己的分布尽可能快在给定间隔内提取python
个随机样本的方法。这就是我的意思:
def my_dist(x):
# Some distribution, assume c1,c2,c3 and c4 are known.
f = c1*exp(-((x-c2)**c3)/c4)
return f
# Draw N random samples from my distribution between given limits a,b.
N = 1000
N_rand_samples = ran_func_sample(my_dist, a, b, N)
其中ran_func_sample
是我所追求的,a, b
是绘制样本的限制。在python
中有什么类似的东西吗?
答案 0 :(得分:13)
您需要使用逆变换采样方法来获取根据您想要的法律分布的随机值。使用此方法,您只需应用反转函数即可 在区间[0,1]中具有标准均匀分布的随机数。
找到倒置函数后,您可以根据所需的分布分配1000个数字:
[inverted_function(random.random()) for x in range(1000)]
更多关于逆变换采样:
此外,与主题相关的StackOverflow存在一个很好的问题:
答案 1 :(得分:9)
该代码实现了n-d离散概率分布的采样。通过在对象上设置标志,它也可以用作分段常数概率分布,然后可以用来近似任意pdf。好吧,任意pdf,支持紧凑;如果您有效地想要采样非常长的尾部,则需要对pdf进行非均匀描述。但即使对于通风点扩散功能(我最初创建它)这样的事情,这仍然是有效的。内部的值排序对于获得准确性至关重要;尾巴中的许多小值应该有很大的贡献,但是它们会以fp的精度被淹没而不进行分类。
class Distribution(object):
"""
draws samples from a one dimensional probability distribution,
by means of inversion of a discrete inverstion of a cumulative density function
the pdf can be sorted first to prevent numerical error in the cumulative sum
this is set as default; for big density functions with high contrast,
it is absolutely necessary, and for small density functions,
the overhead is minimal
a call to this distibution object returns indices into density array
"""
def __init__(self, pdf, sort = True, interpolation = True, transform = lambda x: x):
self.shape = pdf.shape
self.pdf = pdf.ravel()
self.sort = sort
self.interpolation = interpolation
self.transform = transform
#a pdf can not be negative
assert(np.all(pdf>=0))
#sort the pdf by magnitude
if self.sort:
self.sortindex = np.argsort(self.pdf, axis=None)
self.pdf = self.pdf[self.sortindex]
#construct the cumulative distribution function
self.cdf = np.cumsum(self.pdf)
@property
def ndim(self):
return len(self.shape)
@property
def sum(self):
"""cached sum of all pdf values; the pdf need not sum to one, and is imlpicitly normalized"""
return self.cdf[-1]
def __call__(self, N):
"""draw """
#pick numbers which are uniformly random over the cumulative distribution function
choice = np.random.uniform(high = self.sum, size = N)
#find the indices corresponding to this point on the CDF
index = np.searchsorted(self.cdf, choice)
#if necessary, map the indices back to their original ordering
if self.sort:
index = self.sortindex[index]
#map back to multi-dimensional indexing
index = np.unravel_index(index, self.shape)
index = np.vstack(index)
#is this a discrete or piecewise continuous distribution?
if self.interpolation:
index = index + np.random.uniform(size=index.shape)
return self.transform(index)
if __name__=='__main__':
shape = 3,3
pdf = np.ones(shape)
pdf[1]=0
dist = Distribution(pdf, transform=lambda i:i-1.5)
print dist(10)
import matplotlib.pyplot as pp
pp.scatter(*dist(1000))
pp.show()
作为一个更现实世界的相关例子:
x = np.linspace(-100, 100, 512)
p = np.exp(-x**2)
pdf = p[:,None]*p[None,:] #2d gaussian
dist = Distribution(pdf, transform=lambda i:i-256)
print dist(1000000).mean(axis=1) #should be in the 1/sqrt(1e6) range
import matplotlib.pyplot as pp
pp.scatter(*dist(1000))
pp.show()
答案 2 :(得分:4)
import numpy as np
import scipy.interpolate as interpolate
def inverse_transform_sampling(data, n_bins, n_samples):
hist, bin_edges = np.histogram(data, bins=n_bins, density=True)
cum_values = np.zeros(bin_edges.shape)
cum_values[1:] = np.cumsum(hist*np.diff(bin_edges))
inv_cdf = interpolate.interp1d(cum_values, bin_edges)
r = np.random.rand(n_samples)
return inv_cdf(r)
因此,如果我们提供具有特定分布的数据样本,inverse_transform_sampling
函数将返回具有完全相同分布的数据集。这样做的好处是 我们可以通过在n_samples
变量中指定它来获取我们自己的样本大小 。
答案 3 :(得分:2)
这是一种用装饰器执行inverse transform sampling的好方法。
import numpy as np
from scipy.interpolate import interp1d
def inverse_sample_decorator(dist):
def wrapper(pnts, x_min=-100, x_max=100, n=1e5, **kwargs):
x = np.linspace(x_min, x_max, int(n))
cumulative = np.cumsum(dist(x, **kwargs))
cumulative -= cumulative.min()
f = interp1d(cumulative/cumulative.max(), x)
return f(np.random.random(pnts))
return wrapper
在高斯分布上使用此装饰器,例如:
@inverse_sample_decorator
def gauss(x, amp=1.0, mean=0.0, std=0.2):
return amp*np.exp(-(x-mean)**2/std**2/2.0)
然后可以通过调用函数从分布中生成样本点。关键字参数x_min
和x_max
是原始分布的限制,可以作为参数gauss
以及其他参数分布的参数传递给samples = gauss(5000, mean=20, std=0.8, x_min=19, x_max=21)
。
def inverse_sample_function(dist, pnts, x_min=-100, x_max=100, n=1e5,
**kwargs):
x = np.linspace(x_min, x_max, int(n))
cumulative = np.cumsum(dist(x, **kwargs))
cumulative -= cumulative.min()
f = interp1d(cumulative/cumulative.max(), x)
return f(np.random.random(pnts))
或者,这可以作为将分布作为参数的函数来完成(就像您最初的问题一样),
#firstpage .topnav
答案 4 :(得分:1)
我处于类似的情况,但是我想从多变量分布中抽样,因此,我实现了基本版本的Metropolis-Hastings(这是一种MCMC方法)。
def metropolis_hastings(target_density, size=500000):
burnin_size = 10000
size += burnin_size
x0 = np.array([[0, 0]])
xt = x0
samples = []
for i in range(size):
xt_candidate = np.array([np.random.multivariate_normal(xt[0], np.eye(2))])
accept_prob = (target_density(xt_candidate))/(target_density(xt))
if np.random.uniform(0, 1) < accept_prob:
xt = xt_candidate
samples.append(xt)
samples = np.array(samples[burnin_size:])
samples = np.reshape(samples, [samples.shape[0], 2])
return samples
此函数需要一个函数target_density
,该函数接受一个数据点并计算其概率。
有关详细信息,请查看我的detailed answer。