Question

我有一个Cython模块：

#!python
#cython: language_level=3, boundscheck=False, nonecheck=False

import numpy as np
cimport numpy as np

def portfolio_s2( double[:,:] cv, double[:] weights ):    
    """ Calculate portfolio variance"""
    cdef double s0
    cdef double s1
    cdef double s2
    s0 = 0.0
    for i in range( weights.shape[0] ):
        s0 += weights[i]*weights[i]*cv[i,i]

    s1 = 0.0
    for i in range( weights.shape[0]-1 ):
        s2 = 0.0
        for j in range( i+1, weights.shape[0] ):
            s2 += weights[j]*cv[i,j]
        s1+= weights[i]*s2
    return s0+2.0*s1

我在Numba中有相同的功能：

@nb.jit( nopython=True )
def portfolio_s2( cv, weights ):
    """ Calculate portfolio variance using numba """
    s0 = 0.0
    for i in range( weights.shape[0] ):
        s0 += weights[i]*weights[i]*cv[i,i]

    s1 = 0.0
    for i in range( weights.shape[0]-1 ):
        s2 = 0.0
        for j in range( i+1, weights.shape[0] ):
            s2 += weights[j]*cv[i,j]
        s1+= weights[i]*s2
    return s0+2.0*s1

对于大小为10的协方差矩阵，Numba版本比Cython快20倍。我认为这是由于我在Cython中做错了，但我是Cython的新手并且不知道该怎么做。

使用Cel的优化......

我编写了一个脚本来测试Cel的代码与Numba版本：

    sizes = [ 2, 3, 4, 6, 8, 12, 16, 32, 48, 64, 96, 128, 196, 256 ]
    cython_timings = []
    numba_timings = []
    for size in sizes:
        X = np.random.randn(100,size)
        cv = np.cov( X, rowvar=0 )
        w  = np.ones( cv.shape[0] )

        num_tests=10

        pm.portfolio_s2( cv, w )
        with Timer( 'Cython' ) as cython_timer:
            for _ in range( num_tests ):
                s2_cython = pm.portfolio_s2_opt( cv, w )
        cython_timings.append( cython_timer.interval )

        helpers.portfolio_s2( cv, w )
        with Timer( 'Numba' ) as numba_timer:
            for _ in range( num_tests ):
                s2_numba = helpers.portfolio_s2( cv, w )
        numba_timings.append( numba_timer.interval )

    plt.plot( sizes, cython_timings, label='Cython' )
    plt.plot( sizes, numba_timings, label='Numba' )
    plt.title( 'Execution Time By Covariance Size' )
    plt.legend()
    plt.show()

结果图表如下所示：

enter image description here

该图表显示，对于小协方差矩阵，Numba表现更好。但随着协方差矩阵大小的增加，Cython可以更好地扩展，最终可以大幅提升。

是否存在某种函数调用开销导致Cython对小型矩阵的性能如此差？我对此代码的用例将涉及计算许多小协方差矩阵的协方差。所以我需要更好的小矩阵性能而不是大型。

Answer 1

使用Cython时最重要的是确保所有内容都是静态输入的。

在您的示例中，未键入循环变量i和j。声明cdef size_t i, j已经为您带来了巨大的加速。

cython的文档的Working with NumPy部分中有很好的例子。

这是我的设置和评估：

import numpy as np
n = 100
cv = np.random.rand(n,n)
weights= np.random.rand(n)

原始版本：

%timeit portfolio_s2(cv, weights)
10000 loops, best of 3: 147 µs per loop

优化版本：

%timeit portfolio_s2_opt(cv, weights)
100000 loops, best of 3: 10 µs per loop

以下是代码：

import numpy as np
cimport numpy as np


def portfolio_s2_opt(double[:,:] cv, double[:] weights):    
    """ Calculate portfolio variance"""
    cdef double s0
    cdef double s1
    cdef double s2
    cdef size_t i, j

    s0 = 0.0
    for i in range( weights.shape[0] ):
        s0 += weights[i]*weights[i]*cv[i,i]

    s1 = 0.0
    for i in range( weights.shape[0]-1 ):
        s2 = 0.0
        for j in range( i+1, weights.shape[0] ):
            s2 += weights[j]*cv[i,j]
        s1+= weights[i]*s2
    return s0+2.0*s1

如何优化此Cython功能？

1 个答案: