使用python查找比例采样

时间:2019-07-23 11:09:37

标签: python python-3.x

我遇到了一个明确要求我不要使用numpy和pandas的问题

Prob:从列表A中随机选择一个元素,其概率与其大小成正比。假设我们要进行100次替换相同的实验,则在每个实验中,您将打印一个从A中随机选择的数字。

Ex 1: A = [0 5 27 6 13 28 100 45 10 79]
let f(x) denote the number of times x getting selected in 100 experiments.
f(100) > f(79) > f(45) > f(28) > f(27) > f(13) > f(10) > f(6) > f(5) > f(0)

最初,我将列表A的所有元素求和

然后我将列表A的每个元素除以和(以归一化),并将这些值中的每个存储在另一个列表中(d_dash)

然后我创建了另一个空列表(d_bar),该列​​表将d_dash所有元素的总和

创建了变量r,其中r = random.uniform(0.0,1.0),然后将d_dash与r映射到d_dash [k]的长度,如果r <= d_dash [k],则返回A [k]

但是,我在d_dash [j] .append((A [j] / sum))附近遇到错误list index out of range,不确定是什么问题,因为我没有超出任何一个的索引d_dash或A [j]。

我的逻辑也是正确的吗?分享一个更好的方式做到这一点,我们将不胜感激。

谢谢。

import random

A = [0,5,27,6,13,28,100,45,10,79]

def propotional_sampling(A):
    sum=0
    for i in range(len(A)):
        sum = sum + A[i]

    d_dash=[]

    for j in range(len(A)):
        d_dash[j].append((A[j]/sum))

    #cumulative sum

    d_bar =[]
    d_bar[0]= 0

    for k in range(len(A)):
        d_bar[k] = d_bar[k] + d_dash[k]

    r = random.uniform(0.0,1.0)
    number=0

    for p in range(len(d_bar)):
        if(r<=d_bar[p]):
            number=d_bar[p]
    return number

def sampling_based_on_magnitued():
    for i in range(1,100):
        number = propotional_sampling(A)
        print(number)

sampling_based_on_magnitued()

3 个答案:

答案 0 :(得分:0)

累积和可以由itertools.accumulate计算。循环:

for p in range(len(d_bar)):
    if(r<=d_bar[p]):
        number=d_bar[p]

可以用bisect.bisect()doc)代替:

import random
from itertools import accumulate
from bisect import bisect

A = [0,5,27,6,13,28,100,45,10,79]

def propotional_sampling(A, n=100):
    # calculate cumulative sum from A:
    cum_sum = [*accumulate(A)]
    # cum_sum = [0, 5, 32, 38, 51, 79, 179, 224, 234, 313]

    out = []
    for _ in range(n):
        i = random.random()                     # i = [0.0, 1.0)
        idx = bisect(cum_sum, i*cum_sum[-1])    # get index to list A
        out.append(A[idx])

    return out

print(propotional_sampling(A))

打印(例如):

[10, 100, 100, 79, 28, 45, 45, 27, 79, 79, 79, 79, 100, 27, 100, 100, 100, 13, 45, 100, 5, 100, 45, 79, 100, 28, 79, 79, 6, 45, 27, 28, 27, 79, 100, 79, 79, 28, 100, 79, 45, 100, 10, 28, 28, 13, 79, 79, 79, 79, 28, 45, 45, 100, 28, 27, 79, 27, 45, 79, 45, 100, 28, 100, 100, 5, 100, 79, 28, 79, 13, 100, 100, 79, 28, 100, 79, 13, 27, 100, 28, 10, 27, 28, 100, 45, 79, 100, 100, 100, 28, 79, 100, 45, 28, 79, 79, 5, 45, 28]

答案 1 :(得分:0)

出现“列表索引超出范围”消息的原因是您创建了一个空列表“ d_bar = []”,并开始为其分配值“ d_bar [k] = d_bar [k] + d_dash [k]” ”。我使用跟随的构造函数istead重新创建: 首先,以这种方式定义它:

d_bar = [对于范围(len(A))中的i为0]

此外,我认为这段代码将永远返回1,因为循环没有中断。您可以通过添加“中断”来解决此问题。这是您代码的更新版本:

A = [0, 5, 27, 6, 13, 28, 100, 45, 10, 79]

def pick_a_number_from_list(A):
    sum=0
    for i in A:
        sum+=i
    A_norm=[]
    for j in A:
        A_norm.append(j/sum)
    A_cum=[0 for i in range(len(A))]
    A_cum[0]=A_norm[0]
    for k in range(len(A_norm)-1):
        A_cum[k+1]=A_cum[k]+A_norm[k+1]
    A_cum

    r = random.uniform(0.0,1.0)
    number=0

    for p in range(len(A_cum)):
            if(r<=A_cum[p]):
                number=A[p]
                break
    return number

def sampling_based_on_magnitued():
    for i in range(1,100):
        number = pick_a_number_from_list(A)
        print(number)

sampling_based_on_magnitued()

答案 2 :(得分:0)

以下是执行相同操作的代码:

A = [0, 5, 27, 6, 13, 28, 100, 45, 10, 79]

#Sum of all the elements in the array
S = sum(A)

#Calculating normalized sum
norm_sum = [ele/S for ele in A]

#Calculating cumulative normalized sum
cum_norm_sum = []
cum_norm_sum.append(norm_sum[0])
for itr in range(1, len(norm_sum), 1) :
   cum_norm_sum.append(cum_norm_sum[-1] + norm_sum[itr])

def prop_sampling(cum_norm_sum) :
    """
    This function returns an element
    with proportional sampling.
    """
    r = random.random()
    for itr in range(len(cum_norm_sum)) :
       if r <  cum_norm_sum[itr] :
           return A[itr]

#Sampling 1000 elements from the given list with proportional sampling
sampled_elements = []
for itr in range(1000) :
   sampled_elements.append(prop_sampling(cum_norm_sum))

下图显示了采样点中每个元素的频率:

enter image description here

很明显,每个元素出现的次数与其大小成正比。