使用总和为常数(C)的N个随机数创建类似于Poisson的分布

时间:2018-08-26 19:31:51

标签: python pandas numpy random numbers

我想生成一个随机泊松数分布,其中生成的数之和为1000,分布的下上限为(3-30)。

我可以使用numpy生成随机数:

 
In [2]: np.random.poisson(5, 150)
array([ 4,  4,  6,  4,  8,  6,  4,  2,  6,  8,  8,  8,  1,  4,  3,  4,  1,
        3,  7,  6,  7,  4,  5,  5,  7,  6,  5,  3,  3,  5,  4,  6,  2,  0,
        3,  5,  6,  2,  5,  2,  4,  7,  4,  7,  8,  5,  6,  1,  4,  4,  7,
        4,  7,  2,  7,  4,  3,  8, 10,  2,  5,  7,  6,  3,  5,  7,  8,  5,
        4,  7,  8,  8,  2,  2, 10,  6,  3,  5,  2,  5,  5,  6,  4,  6,  4,
        0,  4,  3,  5,  8,  6,  7,  4,  4,  4,  3,  3,  4,  4,  6,  7,  6,
        3,  9,  7,  7,  4,  5,  2,  4,  3,  6,  5,  6,  3,  6,  8,  9,  6,
        3,  4,  4,  7,  3,  9, 12,  4,  5,  5,  7,  6,  5,  2, 10,  1,  3,
        4,  4,  6,  5,  4,  4,  7,  5,  6,  5,  7,  2,  5,  5])


但是,我想添加更多内容:

- The random number should be minimal of 3 and max of 30 
- The sum of the generated random number should be 1000.

我知道,如果我进行操作,可能不会创建精确的泊松分布。但是,我想要类似Poisson的东西,但建议使用控件。

3 个答案:

答案 0 :(得分:2)

让我写一些行不通的东西,我们拭目以待

泊松分布的性质是,一个参数-λ是同时测量均值和方差的度量。让我们尝试另一种分布,该分布实际上总计为1000,足够接近Poisson。

我会尝试JSFiddle Demo。我们考虑从多项式中采样200个数字。我们将每个采样数移动3,因此满足了最小边界条件。这意味着对于采样的多项式总和(n参数)等于1000-3 * 200 =400。概率p i 将设置为1/200。

因此,对于多项式平均值E [x i ] = np i = 400/200 =2。多项式的方差为= np i (1-p i ),并且由于p i 很小,所以项(1-p i )非常接近为1,因此使采样整数类似于Poisson,均值等于方差。问题是,移位平均值为5之后,方差保持在〜2。

无论如何,一些代码。

import numpy as np

N = 200
shift = 3
n = 1000 - N*shift
p = [1.0 / float(N)] * N

q = np.random.multinomial(n, p, size=1)
print(np.sum(q))
print(np.mean(q))
print(np.var(q))

result = q + shift
print(np.sum(result))
print(np.mean(result))
print(np.var(result))

答案 1 :(得分:2)

这是另一种选择,基于预先分配每个箱的最小值,计算剩余的观测数,并为每个剩余的箱拨泊松率,该泊松速率由多少个观测和剩余的箱确定,但要接受/拒绝是基于每个垃圾箱的上限。

由于泊松是对一个时间间隔内观察到的观测值的计数,因此,如果不是在初始阶段就分配了所有观测值,则将它们随机分配给具有剩余容量的垃圾箱。

这里是:

import numpy as np

def make_poissonish(n, num_bins):
    if n > 30 * num_bins:
        print("requested n exceeds 30 / bin")
        exit(-1)
    if n < 3 * num_bins:
        print("requested n cannot fill 3 / bin")
        exit(-1)

    # Disperse minimum quantity per bin in all bins, then determine remainder
    lst = [3 for _ in range(num_bins)]
    number_remaining = n - num_bins * 3

    # Allocate counts to all bins using a truncated Poisson
    for i in range(num_bins):
        # dial the rate up or down depending on whether we're falling
        # behind or getting ahead in allocating observations to bins
        rate = number_remaining / float(num_bins - i)  # avg per remaining bin

        # keep generating until we meet the constraint requirement (acceptance/rejection)
        while True:
            x = np.random.poisson(rate)
            if x <= 27 and x <= number_remaining: break
        # Found an acceptable count, put it in this bin and move on
        lst[i] += x
        number_remaining -= x

    # If there are still observations remaining, disperse them
    # randomly across bins that have remaining capacity
    while number_remaining > 0:
        i = np.random.randint(0, num_bins)
        if lst[i] >= 30:    # not this one, it's already full!
            continue
        lst[i] += 1
        number_remaining -= 1
    return lst

示例输出:

result = make_poissonish(150, 10)
print(result)                    # => [16, 19, 11, 16, 21, 18, 12, 17, 8, 12]
print(sum(result))               # => 150

result = make_poissonish(50, 10)
print(result)                    # => [3, 5, 5, 4, 3, 3, 15, 3, 6, 3]
print(sum(result))               # => 50

答案 2 :(得分:0)

您可以使用while循环和随机模块轻松完成此操作,它将完成工作:

from random import randint
nums_sum = 0
nums_lst = list()
while nums_sum < 1000:
    n = randint(3, 31)
    nums_sum += n
    nums_lst.append(str(n))
    print(nums_sum)
    if 1000-nums_sum > 30: # means if the sum is more than 30 then complete ..
        continue
    else:
        nums_sum += 1000-nums_sum
print(nums_sum)
print(nums_lst)

那么简单。