Question

我发现模拟中的一个瓶颈是从泊松分布中生成随机数。我的原始代码就是这样的

import numpy as np
#Generating some data. In the actual code this comes from the previous
#steps in the simulation. But this gives an example of the type of data
n = 5000000
pop_n = np.array([range(500000)])

pop_n[:] = np.random.poisson(lam=n*pop_n/np.sum(pop_n))

现在，我读到numba可以非常简单地提高速度。我定义了函数

from numba import jit

@jit()
def poisson(n, pop_n, np=np):
    return np.random.poisson(lam=n*pop_n/np.sum(pop_n))

这个确实比原版跑得快。但是，我试着进一步:)当我写作

@jit(nopython=True)
def poisson(n, pop_n, np=np):
    return np.random.poisson(lam=n*pop_n/np.sum(pop_n))

我得到了

Failed at nopython (nopython frontend)
Invalid usage of Function(np.random.poisson) with parameters     (array(float64, 1d, C))
Known signatures:
 * (float64,) -> int64
 * () -> int64
 * parameterized

一些问题为什么会发生此错误以及如何解决此问题。

有更好的优化吗？

Answer 1

Numba不支持将数组作为lam的{{1}}参数，因此您必须自己执行循环：

np.random.poisson

但根据我的时间，这与使用纯NumPy一样快：

import numba as nb
import numpy as np

@nb.njit
def poisson(n, pop_n):
    res = np.empty_like(pop_n)
    pop_n_sum = np.sum(pop_n)
    for idx, item in enumerate(range(pop_n.shape[0])):
        res[idx] = np.random.poisson(n*pop_n[idx] / pop_n_sum)
    return res

n = 5000000
pop_n = np.array(list(range(1, 500000)), dtype=float)
poisson(n, pop_n)

那是因为即使Numba支持%timeit poisson(n, pop_n) # 203 ms ± 1.79 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) %timeit np.random.poisson(lam=n*pop_n/np.sum(pop_n)) # 203 ms ± 3.97 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)和np.random.poisson之类的功能，这些只是为了方便起见而不是为了加速代码（很多）。它可能在某种程度上可以避免函数调用开销，但考虑到它只会在纯Python中调用np.sum一次并不多（与创建50万个随机数相比完全可以忽略不计）。

如果你想加速一个无法用纯粹的NumPy 做的循环，Numba的速度非常快，但你不应该期望numba（或其他任何东西）可以提供相当大的加速NumPy功能。如果很容易让它们更快 - NumPy开发人员也会更快。：）

Numba和泊松分布的随机数

1 个答案: