计算范围间隔中的数字数据的频率

时间:2017-03-01 22:27:00

标签: python python-3.x

我正在尝试改进我的代码,将随机生成的数字排序到范围间隔中,以便分析随机数生成器的准确性。目前我的排序由20个elif语句执行(我只有python的入门知识),因此我的代码需要很长时间才能执行。如何更有效地将数值数据分类为间隔,并且只保存间隔中数字的频率?

from datetime import datetime
startTime = datetime.now()
def test_rand(points):
    import random
    d1,d2,d3,d4,d5,d6,d7,d8,d9,d10,d11,d12,d13,d14,d15,d16,d17,d18,d19,d20 = 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
    # these variables will be used to count frequency of numbers into 20 intervals: (-10,-9], (-9,-8] ... etc
    g1,g2,g3,g4,g5,g6,g7,g8,g9,g10,g11,g12,g13,g14,g15,g16,g17,g18,g19,g20 = 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
    # these variables will be used to count frequency of every 20 numbers into 20 intervals: (-200,-180], (-180,-160] ... etc
    y = 0
    n = 0
    for i in range(points):
        x = random.uniform(-10.0,10.0)
        while n < 20:
            y += x
            n += 1
            break
        if n == 20:
            if y < -180:
                g1 += 1
            elif y < -160 and y > -180:
                g2 += 1
            elif y < -140 and y > -160:
                g3 += 1
            elif y < -120 and y > -140:
                g4 += 1
            elif y < -100 and y > -120:
                g5 += 1
            elif y < -80 and y > -100:
                g6 += 1
            elif y < -60 and y > -80:
                g7 += 1
            elif y < -40 and y > -60:
                g8 += 1
            elif y < -20 and y > -40:
                g9 += 1
            elif y < 0 and y > -20:
                g10 += 1
            elif y < 20 and y > 0:
                g11 += 1
            elif y < 40 and y > 20:
                g12 += 1
            elif y < 60 and y > 40:
                g13 += 1
            elif y < 80 and y > 60:
                g14 += 1
            elif y < 100 and y > 80:
                g15 += 1
            elif y < 120 and y > 100:
                g16 += 1
            elif y < 140 and y > 120:
                g17 += 1
            elif y < 160 and y > 140:
                g18 += 1
            elif y < 180 and y > 160:
                g19 += 1
            elif y > 180:
                g20 += 1
            y *= 0
            n *= 0

        if x < -9:
            d1 += 1
        elif x < -8 and x > -9:
            d2 += 1
        elif x < -7 and x > -8:
            d3 += 1
        elif x < -6 and x > -7:
            d4 += 1
        elif x < -5 and x > -6:
            d5 += 1
        elif x < -4 and x > -5:
            d6 += 1
        elif x < -3 and x > -4:
            d7 += 1
        elif x < -2 and x > -3:
            d8 += 1
        elif x < -1 and x > -2:
            d9 += 1
        elif x < 0 and x > -1:
            d10 += 1
        elif x < 1 and x > 0:
            d11 += 1
        elif x < 2 and x > 1:
            d12 += 1
        elif x < 3 and x > 2:
            d13 += 1
        elif x < 4 and x > 3:
            d14 += 1
        elif x < 5 and x > 4:
            d15 += 1
        elif x < 6 and x > 5:
            d16 += 1
        elif x < 7 and x > 6:
            d17 += 1
        elif x < 8 and x > 7:
            d18 += 1
        elif x < 9 and x > 8:
            d19 += 1
        elif x > 9:
            d20 += 1

    return d1,d2,d3,d4,d5,d6,d7,d8,d9,d10,d11,d12,d13,d14,d15,d16,d17,d18,d19,d20,g1,g2,g3,g4,g5,g6,g7,g8,g9,g10,g11,g12,g13,g14,g15,g16,g17,g18,g19,g20

print(test_rand(100000000))    

print (datetime.now() - startTime)

代码用于使用随机数执行2个函数。第一种是将数字排序为20个区间(因此每个区间应该有5%的数字)。第二个是将每生成20个数字相加并将它们放入20个新的区间(应观察到正常曲线)

@tristan我已修改您的代码以执行上述操作:

for idx in range(points):
        val_1 = uniform(-10, 10)
        val_20 += val_1
        if (idx + 1) % 20 == 0:
            counter2[bisect(occ2, val_20)] += 1
            counter1[bisect(occ1, val_1)] += 1
            val_20 = 0
            val_1 = 0
        else:
            counter1[bisect(occ1, val_1)] += 1
            val_1 = 0

虽然这种方法只能节省6秒(1:54 - > 1:48),但它更有组织,更容易查看。谢谢你的帮助!

1 个答案:

答案 0 :(得分:2)

假设数据总是可以分配给您的一个间隔(您可以预先检查),使用bisect.bisect()将是一种有效且紧凑的方式:

from bisect import bisect
from random import randint

occ1 = [-9 + 1 * i for i in range(19)]
occ2 = [-180 + 20 * i for i in range(19)]
data = [randint(-10, 10) for _ in range(100)]
counter1, counter2 = {i: 0 for i in range(20)}, {i: 0 for i in range(20)}

for idx, element in enumerate(data):
    if (idx + 1) % 20 == 0:
        counter2[bisect(occ2, element)] += 1
    else:
        counter1[bisect(occ1, element)] += 1

bisect ()函数返回位置,其中元素应插入有序数组,如 occ 维持秩序。在occ中有19个值,有20个不同的位置可以插入一个值。也就是说,在第一个之前,在任何元素之间或之后。这相当于你的20个间隔。 唯一要记住的是,e。 G。如果元素小于或大于您的区间的上限或下限,它仍将被分配到最低或最高区间。生成一个尊重区间界限的随机数会阻止这种情况。

根据您的问题,我不确定您是否要累积一些随机数或只检查点列表,其中每20个值执行不同的检查。 该解决方案可以很容易地适应累积随机数,直到达到20次迭代:

from bisect import bisect
from random import uniform

points, value = 100000000, 0
occ1 = [-9 + 1 * i for i in range(19)]
occ2 = [-180 + 20 * i for i in range(19)]
counter1, counter2 = {i: 0 for i in range(20)}, {i: 0 for i in range(20)}

for idx in range(points):
    value += uniform(-10, 10)
    if (idx + 1) % 20 == 0:
        counter2[bisect(occ2, value)] += 1
        value = 0
    else:
        counter1[bisect(occ1, value)] += 1

在我的机器上100M点运行100秒。