Question

我有一个模拟，其中最终用户可以提供任意多个函数，然后在最内层循环中调用。类似的东西：

class Simulation:

    def __init__(self):
        self.rates []
        self.amount = 1

    def add(self, rate):
        self.rates.append(rate)

    def run(self, maxtime):
        for t in range(0, maxtime):
            for rate in self.rates:
                self.amount *= rate(t)

def rate(t):
    return t**2

simulation = Simulation()

simulation.add(rate)
simulation.run(100000)

作为一个python循环，这是非常缓慢的，但我无法使用我的正常方法来加速循环。

因为函数是用户定义的，所以我不能“numpyfy”最里面的调用（重写使得最里面的工作由优化的numpy代码完成）。

我首先尝试了numba，但是numba不允许将函数传递给其他函数，即使这些函数也是numba编译的。它可以使用闭包，但因为我不知道一开始有多少函数，我认为我不能使用它。关闭功能列表失败：

@numba.jit(nopython=True)
def a()
    return 1

@numba.jit(nopython=True)
def b()
    return 2

fs = [a, b]

@numba.jit(nopython=True)
def c()
    total = 0
    for f in fs:
        total += f()
    return total

c()

失败并显示错误：

[...]
  File "/home/syrn/.local/lib/python3.6/site-packages/numba/types/containers.py", line 348, in is_precise
    return self.dtype.is_precise()
numba.errors.InternalError: 'NoneType' object has no attribute 'is_precise' 
[1] During: typing of intrinsic-call at <stdin> (4)

我找不到来源，但我认为numba的文档在某处说明这不是错误，但预计不起作用。

类似下面的内容可能会解决从列表中调用函数的问题，但看起来好主意：

def run(self, maxtime):
    len_rates = len(rates)
    f1 = rates[0]
    if len_rates >= 1:
        f2 = rates[1]
    if len_rates >= 2:
        f3 = rates[2]
    #[... repeat until some arbitrary limit]
    @numba.jit(nopython=True)
    def inner(amount):
        for t in range(0, maxtime)
            amount *= f1(t)
            if len_rates >= 1:
                amount *= f2(t)
            if len_rates >= 2:
                amount *= f3(t)
            #[... repeat until the same arbitrary limit]
        return amount

    self.amount = inner(self.amount)

我想也可以做一些字节码黑客攻击：使用numba编译函数，将函数名称的字符串列表传递给inner，执行call(func_name)之类的操作，然后重写字节码使它变为func_name(t)。

对于cython只是编译循环和乘法可能会加速一点，但如果用户定义的函数仍然是python只是调用python函数可能仍然会很慢（虽然我还没有分析）。我并没有真正找到关于使用cython“动态编译”函数的更多信息，但我想我需要以某种方式为用户提供的函数添加一些类型信息，这看起来很难。

有没有什么好方法可以使用用户定义的函数加速循环，而无需从中解析和生成代码？

Answer 1

我认为您无法加速用户的功能 - 最终用户有责任编写有效的代码。你可以做的是，有可能以有效的方式与你的程序进行交互，而无需支付管理费用。

你可以使用Cython，如果用户也是使用cython的游戏，那么与纯python解决方案相比，你们都可以实现大约100的加速。

作为基线，我稍微改变了你的例子：函数rate做得更多。

class Simulation:

    def __init__(self, rates):
        self.rates=list(rates)
        self.amount = 1

    def run(self, maxtime):
        for t in range(0, maxtime):
            for rate in self.rates:
                self.amount += rate(t)

def rate(t):
    return t*t*t+2*t

收率：

>>> simulation=Simulation([rate])
>>> %timeit simulation.run(10**5)
43.3 ms ± 1.16 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

我们可以使用cython来加快速度，首先是你的run函数：

%%cython
cdef class Simulation:
    cdef int amount
    cdef list rates
    def __init__(self, rates):
        self.rates=list(rates)
        self.amount = 1

    def run(self, int maxtime):
        cdef int t
        for t in range(maxtime):
            for rate in self.rates:
                self.amount *= rate(t)

这给了我们几乎因子2：

>>> %timeit simulation.run(10**5)
23.2 ms ± 158 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

用户还可以使用Cython加速计算：

%%cython
def rate(int t):
  return t*t*t+2*t

>>> %timeit simulation.run(10**5)
7.08 ms ± 145 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

使用Cython已经让我们加速了6，现在瓶颈是什么？我们仍然使用python进行多态/调度，这非常昂贵，因为为了使用它，必须创建Python对象（即这里的Python整数）。我们可以用Cython做得更好吗？是的，如果我们为编译时传递给run的函数定义一个接口：

%%cython   
cdef class FunInterface:
   cpdef int calc(self, int t):
      pass

cdef class Simulation:
    cdef int amount
    cdef list rates

    def __init__(self, rates):
        self.rates=list(rates)
        self.amount = 1

    def run(self, int maxtime):
        cdef int t
        cdef FunInterface f
        for t in range(maxtime):
            for f in self.rates:
                self.amount *= f.calc(t)

cdef class  Rate(FunInterface):
    cpdef int calc(self, int t):
        return t*t*t+2*t

这产生了额外的加速7：

 simulation=Simulation([Rate()])
 >>>%timeit simulation.run(10**5)
 1.03 ms ± 20.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

上面代码中最重要的部分是line：

self.amount *= f.calc(t)

不再需要python进行调度，而是使用与c ++中的虚函数非常相似的机器。这种c ++方法只有一个间接/查找的非常小的开销。这也意味着，函数和参数的结果都不能转换为Python对象。要使其正常工作，Rate必须是cpdef函数，您可以查看here以获取更多详细信息，继承如何用于cpdef函数。

瓶颈现在是for f in self.rates行，因为我们仍然需要在每一步中进行大量的python交互。如果我们可以对此进行改进，那么这是一个可行的例子：

%%cython
.....
cdef class Simulation:
    cdef int amount
    cdef FunInterface f  #just one function, no list

    def __init__(self, fun):
        self.f=fun
        self.amount = 1

    def run(self, int maxtime):
        cdef int t
        for t in range(maxtime):
                self.amount *= self.f.calc(t)

...

 >>>  simulation=Simulation(Rate())
 >>> %timeit simulation.run(10**5)
 408 µs ± 1.41 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

另一个因素2，但是您可以决定是否需要更复杂的代码来存储FunInterface - 没有python-interaction的对象列表，这是非常值得的。

加速用户定义的功能

1 个答案: