Question

假设我有一个例程，当被调用时，将使用RNG并在30％的时间内返回True，否则返回False。这很简单。但是，如果我想模拟一下，如果我将该例程调用100亿次，我会得到多少True个结果呢？

在一个循环中调用它100亿次将花费太长时间。将100亿乘以30％将产生30亿的统计预期结果，但不会涉及实际的随机性。（而且结果正好 30亿的可能性并不是那么大。）

是否存在用于模拟这样一系列随机事件的聚合结果的算法，这样如果它被多次调用，它给出的结果将显示与实际运行多次模拟的随机序列相同的分布曲线，在O（1）时间运行（即，随着要模拟的系列的长度增加，运行时间不会更长）？

Answer 1

我会说 - 可以在O（1）中完成！

描述你的情况的

Binomial distribution可以（在某些情况下）通过正态分布来近似。当n*p和n*(1-p)都大于5时，可以执行此操作，因此对于p=0.3，可以对所有n > 17执行此操作。当n变得非常大（如数百万）时，近似越来越好。

可以使用Box–Muller transform轻松计算具有正态分布的随机数。您需要做的只是0和1之间的两个随机数.Bed-Muller变换给出N(0,1)分布中的两个随机数，称为标准法线。使用N(μ, σ2)公式可以实现X = μ + σZ，其中Z是标准法线。

Answer 2

经过深入思考后，我可以提出这个Python解决方案，它在O（log（n））中工作，不使用任何近似值。但是，对于大n，@ MarcinJuraszek的解决方案更合适。

首先，我生成一个值为Cumulative Binomial Distribution function的Python列表。
稍后我使用Inverse Transform Sampling。

第一步的成本是O（n） - 但你必须只做一次。第二步的成本只是O（log（n）） - 这实际上是二元搜索的成本。由于代码有很多依赖关系，你可以看一下这个截图：

screenshot with plots showing the distributions

import numpy.random as random
import matplotlib.pyplot as pyplot
import scipy.stats as stats
import bisect

# This is the number of trials.
size = 6;

# this generates in memory an object, which contains
# a full information on desired binomial
# distribution. The object has to be generated only once.
# THIS WORKS IN O(n).
binomialInstance = stats.binom(size, 0.3)

# this pulls a probabilty mass function in form of python list
binomialTable = [binomialInstance.pmf(i) for i in range(size + 1)]

# this pulls a python list from binomialInstance, first
# processing it to produce a cumulative distribution function.
binomialCumulative = [binomialInstance.cdf(i) for i in range(size + 1)]

# this produces a plot of dots: first argument is x-axis (just
# subsequent integers), second argument is our table.
pyplot.plot([i for i in range(len(binomialTable))], binomialTable, 'ro')
pyplot.figure()
pyplot.plot([i for i in range(len(binomialCumulative))], binomialCumulative, 'ro')

# now, we can cheaply draw a sample from our distribution.
# we can use bisect to draw a random answer.
# THIS WORKS IN log(n).
cutOff = random.random(1)
print "this is our cut-off value: " + str(cutOff)
print "this is a number of successful trials: " + str(bisect.bisect(binomialCumulative, cutOff))
pyplot.show()

Answer 3

正如其他评论者所提到的，您可以使用二项分布。但是，由于您正在处理大量样本，因此应考虑使用正态分布近似。

是否有O（1）算法用于生成一系列随机事件的结果？

3 个答案: