我有两个非常简单的cython函数,就象下面的函数一样(与-fopenmp
一起编译):
#cython: language_level=3
#cython: wraparound=False
#cython: boundscheck=False
#cython: nonecheck=False
#cython: cdivision=True
import numpy as np
cimport numpy as np
cimport cython
from cython.parallel cimport prange, parallel
def ta_pa(double[:,::1] out, double[:,::1] u, double[:,::1] K, double a):
cdef Py_ssize_t ix, iz
cdef Py_ssize_t nx = out.shape[0]
cdef Py_ssize_t nz = out.shape[1]
with nogil, parallel():
for ix in prange(nx):
for iz in range(nz):
out[ix, iz] = u[ix, iz] - a*K[ix, iz]
def ta_li(double[:,::1] out, double[:,::1] u, double[:,::1] K, double a):
cdef Py_ssize_t ix, iz
cdef Py_ssize_t nx = out.shape[0]
cdef Py_ssize_t nz = out.shape[1]
for ix in range(nx):
for iz in range(nz):
out[ix, iz] = u[ix, iz] - a*K[ix, iz]
当我使用timeit
在笔记本中测试这两个功能时,使用256x256数组可获得以下结果:
ta_li
:43.3 µs ta_pa
:28.1 µs 直到这里,一切都还好!但是,当我在更大的脚本中运行ta_li
并将ta_pa
替换为ta_li
(除其他功能外,该功能执行了数千次)时,其执行速度比与ta_li
一起使用时要慢得多! / p>
ta_pa
执行脚本时,正在使用1个CPU @Component
@ManagedResource
public class JMXDemonstration {
@Autowired
private ApplicationContext applicationContext;
@Autowired
private SomeRandomThing thing;
@Value("${jmxDemonstration.name}")
private String name;
@ManagedAttribute
public String getName() { return name; }
@ManagedAttribute
public void setName(String name) { this.name = name; }
@ManagedOperation
public String buildHelloWorldMessage() {
return "Hello, " + name + ": " + thing.getId();
}
@ManagedOperation
public void assignValueToBeanProperty(String beanName, String propertyName, String expression) {
Object bean = applicationContext.getBean(beanName);
ExpressionParser parser = new SpelExpressionParser();
SimpleEvaluationContext evalContext = SimpleEvaluationContext.forReadWriteDataBinding().build();
parser.parseExpression(propertyName).setValue(evalContext, bean, expression);
}
}
执行脚本时,正在使用4 cpus 我确定对此行为有适当的解释,但我不明白。有什么假设吗?