Python:numpy的random.choice()更快的替代方案?

时间:2018-05-27 03:27:11

标签: python numpy random

我正在尝试在0到999之间采样1000个数字,其中一个权重向量决定了选择一个特定数字的概率:

import numpy as np
resampled_indices = np.random.choice(a = 1000, size = 1000, replace = True, p = weights)

不幸的是,这个过程必须在更大的for循环中运行数千次,并且似乎np.random.choice是该过程中的主要速度瓶颈。因此,我想知道是否有任何方法可以加快np.random.choice或使用提供相同结果的替代方法。

2 个答案:

答案 0 :(得分:0)

通过使用统一采样,然后使用np.searchsorted“反转”累积分布,您似乎可以稍微快一点:

# assume arbitrary probabilities
weights = np.random.randn(1000)**2
weights /= weights.sum()

def weighted_random(w, n):
    cumsum = np.cumsum(w)
    rdm_unif = np.random.rand(n)
    return np.searchsorted(cumsum, rdm_unif)

# first method
%timeit np.random.choice(a = 1000, size = 1000, replace = True, p = weights)
# 10000 loops, best of 3: 220 µs per loop

# proposed method
%timeit weighted_random(weights, n)
# 10000 loops, best of 3: 158 µs per loop

现在我们可以凭经验检查概率是否正确:

samples =np.empty((10000,1000),dtype=int)
for i in xrange(10000):
    samples[i,:] = weighted_random(weights)

empirical = 1. * np.bincount(samples.flatten()) / samples.size
((empirical - weights)**2).max()
# 3.5e-09

答案 1 :(得分:0)

对于较小的样本量,我发现python 3.6函数random.choices更快。在下面的脚本中,收支平衡点的样本量为99,随着样本量的减少,random.choices变得比'numpy.random.choice'更快。在没有权重的情况下,收支平衡数稍高一些,为120。但是,对于人口数量为1000的情况,random.choices的权重要慢3倍,而没有权重时要慢7倍。

import numpy as np
import time
import random

SIZE = 98


def numpy_choice():
    for count in range(10000):
        resampled_indices = np.random.choice(a=population_array, size=SIZE, replace=True, p=weights)
    return


def python_choices():
    for count in range(10000):
        resampled_indices = random.choices(population_list,  weights=weights, k=SIZE)
    return


if __name__ == '__main__':
    weights = [1/SIZE for i in range(SIZE)]
    population_array = np.arange(SIZE)
    population_list = list(population_array)

    start_time = time.time()
    numpy_choice()
    end_time = time.time()
    print('numpy.choice time:', end_time-start_time) 
    # gave 0.299

    start_time = time.time()
    python_choices()
    end_time = time.time()
    print('python random.choices time:', end_time-start_time)
    # gave 0.296