cython中并行函数的运行时间

时间:2019-03-28 23:05:29

标签: python parallel-processing cython

我有两个非常简单的cython函数,就象下面的函数一样(与-fopenmp一起编译):

#cython: language_level=3
#cython: wraparound=False
#cython: boundscheck=False
#cython: nonecheck=False
#cython: cdivision=True

import numpy as np
cimport numpy as np
cimport cython
from cython.parallel cimport prange, parallel


def ta_pa(double[:,::1] out, double[:,::1] u, double[:,::1] K, double a):


    cdef Py_ssize_t ix, iz
    cdef Py_ssize_t nx = out.shape[0]
    cdef Py_ssize_t nz = out.shape[1]

    with nogil, parallel():
        for ix in prange(nx):
            for iz in range(nz):
                out[ix, iz] = u[ix, iz] - a*K[ix, iz]

def ta_li(double[:,::1] out, double[:,::1] u, double[:,::1] K, double a):


    cdef Py_ssize_t ix, iz
    cdef Py_ssize_t nx = out.shape[0]
    cdef Py_ssize_t nz = out.shape[1]


    for ix in range(nx):
        for iz in range(nz):
            out[ix, iz] = u[ix, iz] - a*K[ix, iz]

当我使用timeit在笔记本中测试这两个功能时,使用256x256数组可获得以下结果:

  • ta_li:43.3 µs
  • ta_pa:28.1 µs

直到这里,一切都还好!但是,当我在更大的脚本中运行ta_li并将ta_pa替换为ta_li(除其他功能外,该功能执行了数千次)时,其执行速度比与ta_li一起使用时要慢得多! / p>

  • 使用ta_pa执行脚本时,正在使用1个CPU
  • 使用@Component @ManagedResource public class JMXDemonstration { @Autowired private ApplicationContext applicationContext; @Autowired private SomeRandomThing thing; @Value("${jmxDemonstration.name}") private String name; @ManagedAttribute public String getName() { return name; } @ManagedAttribute public void setName(String name) { this.name = name; } @ManagedOperation public String buildHelloWorldMessage() { return "Hello, " + name + ": " + thing.getId(); } @ManagedOperation public void assignValueToBeanProperty(String beanName, String propertyName, String expression) { Object bean = applicationContext.getBean(beanName); ExpressionParser parser = new SpelExpressionParser(); SimpleEvaluationContext evalContext = SimpleEvaluationContext.forReadWriteDataBinding().build(); parser.parseExpression(propertyName).setValue(evalContext, bean, expression); } } 执行脚本时,正在使用4 cpus
  • 我在所有测试中都使用了256x256阵列

我确定对此行为有适当的解释,但我不明白。有什么假设吗?

0 个答案:

没有答案