我观察到python的默认random.sample
比numpy的random.choice
快得多。从一个长度为100万的数组中取一小部分样本,random.sample
比其numpy的对应物快1000倍。
In [1]: import numpy as np
In [2]: import random
In [3]: arr = [x for x in range(1000000)]
In [4]: nparr = np.array(arr)
In [5]: %timeit random.sample(arr, 5)
The slowest run took 5.25 times longer than the fastest. This could mean that an intermediate result is being cached
100000 loops, best of 3: 4.54 µs per loop
In [6]: %timeit np.random.choice(arr, 5)
10 loops, best of 3: 47.7 ms per loop
In [7]: %timeit np.random.choice(nparr, 5)
The slowest run took 6.79 times longer than the fastest. This could mean that an intermediate result is being cached
100000 loops, best of 3: 7.79 µs per loop
尽管来自numpy数组的numpy采样速度相当快,但它比默认随机采样慢。
上述观察是否正确,或者我错过了random.sample
和np.random.choice
计算的差异?
答案 0 :(得分:1)
您在第一次调用numpy.random.choice
时看到的只是将列表arr
转换为numpy数组的开销。
至于你的第二次电话会议,情况稍差可能是因为numpy.random.choice
提供了非均匀采样的能力,也可以在没有替换的情况下进行采样。