使用cython分配到任意数组位置。分配速度取决于值?

时间:2018-03-03 03:15:58

标签: python c arrays cython variable-assignment

我在代码中看到了一些奇怪的行为。我正在编写代码来计算前向卡尔曼滤波器,但我有一个状态转换模型,其中有许多0 s,所以能够只计算协方差矩阵的某些元素会很好。

因此,为了测试这一点,我想使用填充单个数组元素。令我惊讶的是,我找到了

  1. 将输出写入特定数组位置的速度非常慢(function fill(...)),而每次(function nofill(...))只是将其分配给标量变量(基本上忘记了结果),并且

  2. 设置C=0.131,同时不会影响nofill(...)运行的时间,后者选择C使fill(...)运行速度慢2倍。这让我感到困惑。任何人都可以解释为什么我会看到这个吗?

  3. 代码: -

    #################  file way_too_slow.pyx
    from libc.math cimport sin
    
    #  Setting C=0.1 or 31 doesn't change affect performance of calling nofill(...), but it makes the fill(...) slower.  I have no clue why.
    cdef double C = 0.1
    
    #  This function just throws away its output.
    
    def nofill(double[::1] x, double[::1] y, long N):
        cdef int i
        cdef double *p_x = &x[0]
        cdef double *p_y = &y[0]
        cdef double d
    
        with nogil:
            for 0 <= i < N:
                d = ((p_x[i] + p_y[i])*3 + p_x[i] - p_y[i]) + sin(p_x[i]*C)  #  C appears here
    
    #  Same function keeps its output.
    #  However:   #1 - MUCH slower than 
    def fill(double[::1] x, double[::1] y, double[::1] out, long N):
        cdef int i
        cdef double *p_x = &x[0]
        cdef double *p_y = &y[0]
        cdef double *p_o = &out[0]
        cdef double d
    
        with nogil:
            for 0 <= i < N:
                p_o[i] = ((p_x[i] + p_y[i])*3 + p_x[i] - p_y[i]) + sin(p_x[i]*C)    # C appears here
    

    以上代码由python程序调用

    ####################  run_way_too_slow.py
    import way_too_slow as _wts
    import time as _tm
    
    N = 80000
    x = _N.random.randn(N)
    y = _N.random.randn(N)
    out  = _N.empty(N)
    
    t1 = _tm.time()
    _wts.nofill(x, y, N)
    t2 = _tm.time()
    _wts.fill(x, y, out, N)
    t3 = _tm.time()
    
    print "nofill() ET: %.3e" % (t2-t1)
    print "fill()   ET: %.3e" % (t3-t2)
    
    print "fill() is slower by factor %.3f" % ((t3-t2)/(t2-t1))
    

    使用setup.py文件编译cython

    #################  setup.py
    from distutils.core import setup, Extension
    from distutils.sysconfig import get_python_inc
    from distutils.extension import Extension
    from Cython.Distutils import build_ext
    
    incdir=[get_python_inc(plat_specific=1)]
    libdir = ['/usr/local/lib']
    
    cmdclass = {'build_ext' : build_ext}
    
    ext_modules = Extension("way_too_slow",
                            ["way_too_slow.pyx"],
                            include_dirs=incdir,   #  include_dirs for Mac
                            library_dirs=libdir)
    
    setup(
        name="way_too_slow",
        cmdclass = cmdclass,
        ext_modules = [ext_modules]
    )
    

    以下是跑步的典型输出&#34; run_way_too_slow.py&#34;使用C = 0.1

    >>> exf("run_way_too_slow.py")
    nofill() ET: 6.700e-05
    fill()   ET: 6.409e-04
    fill() is slower by factor 9.566
    

    C = 31的典型运行。

    >>> exf("run_way_too_slow.py")
    nofill() ET: 6.795e-05
    fill()   ET: 1.566e-03
    fill() is slower by factor 23.046
    

    我们可以看到

    1. 与分配给double相比,分配到指定的数组位置相当慢。

    2. 出于某种原因,分配速度似乎取决于计算中的操作 - 这对我来说毫无意义。

    3. 非常感谢任何见解。

0 个答案:

没有答案