Question

我想指定一个发行版的probably density function，然后在python中从该发行版中选取N个随机数。我该怎么做呢？

Answer 1

通常，您希望具有逆累积概率密度函数。一旦你有了，那么沿着分布生成随机数很简单：

import random

def sample(n):
    return [ icdf(random.random()) for _ in range(n) ]

或者，如果你使用NumPy：

import numpy as np

def sample(n):
    return icdf(np.random.random(n))

在这两种情况下，icdf是反向累积分布函数，它接受0到1之间的值，并从分布中输出相应的值。

为了说明icdf的性质，我们将在值10和12之间进行简单的均匀分布作为示例：

概率分布函数在10到12之间为0.5，在其他地方为零
累积分布函数0低于10（无样本低于10），1高于12（无样本高于12）并且值之间呈线性增长（PDF的整数）
反向累积分布函数仅在0和1之间定义。在0时为10，在12时为1，在值之间线性变化

当然，困难的部分是获得逆累积密度函数。这实际上取决于你的分布，有时候你可能有分析功能，有时候你可能想要求插值。数值方法可能很有用，因为可以使用数值积分来创建CDF，并且可以使用插值来反转它。

Answer 2

这是我检索根据给定概率密度函数分配的单个随机数的函数。我使用蒙特卡罗的方法。当然， n 随机数可以通过调用此函数 n 来生成。

    """
    Draws a random number from given probability density function.

    Parameters
    ----------
        pdf       -- the function pointer to a probability density function of form P = pdf(x)
        interval  -- the resulting random number is restricted to this interval
        pdfmax    -- the maximum of the probability density function
        integers  -- boolean, indicating if the result is desired as integer
        max_iterations -- maximum number of 'tries' to find a combination of random numbers (rand_x, rand_y) located below the function value calc_y = pdf(rand_x).

    returns a single random number according the pdf distribution.
    """
    def draw_random_number_from_pdf(pdf, interval, pdfmax = 1, integers = False, max_iterations = 10000):
        for i in range(max_iterations):
            if integers == True:
                rand_x = np.random.randint(interval[0], interval[1])
            else:
                rand_x = (interval[1] - interval[0]) * np.random.random(1) + interval[0] #(b - a) * random_sample() + a

            rand_y = pdfmax * np.random.random(1) 
            calc_y = pdf(rand_x)

            if(rand_y <= calc_y ):
                return rand_x

        raise Exception("Could not find a matching random number within pdf in " + max_iterations + " iterations.")

在我看来，如果您不必检索大量随机变量，此解决方案的性能优于其他解决方案。另一个好处是您只需要PDF并避免计算CDF，反CDF或权重。

使用给定的概率密度函数生成随机数

2 个答案: