Question

我经常很难找到cython代码中的瓶颈。如何逐行分析cython个功能？

Answer 1

Robert Bradshaw帮助我让Robert Kern的line_profiler工具为cdef函数工作，我想我会在stackoverflow上分享结果。

简而言之，设置常规.pyx文件并构建脚本，并在致电cythonize之前添加以下内容。

from Cython.Compiler.Options import directive_defaults

directive_defaults['linetrace'] = True
directive_defaults['binding'] = True

此外，您需要通过修改CYTHON_TRACE=1设置来定义C宏extensions，以便

extensions = [
    Extension("test", ["test.pyx"], define_macros=[('CYTHON_TRACE', '1')])
]

%%cython笔记本中使用iPython魔法的工作示例如下： http://nbviewer.ipython.org/gist/tillahoffmann/296501acea231cbdf5e7

Answer 2

虽然我不会真正称之为分析，但是通过cython运行-a（注释）来分析您的Cython代码还有另一个选项，这会创建一个主页瓶颈突出显示的网页。例如，当我忘记声明一些变量时：

正确声明后（cdef double dudz, dvdz）：

Answer 3

虽然@Till's answer显示了使用setup.py方法对Cython代码进行概要分析的方法，但此答案与IPython / Jupiter笔记本中的即席概要分析有关，或多或少是{{ 3}}到IPython / Jupiter。

%prun-魔术：

如果应使用Cython-documentation，则将Cython的编译器指令profile设置为True（此处是Cython文档的示例）就足够了：

%%cython
# cython: profile=True

def recip_square(i):
    return 1. / i ** 3

def approx_pi(n=10000000):
    val = 0.
    for k in range(1, n + 1):
        val += recip_square(k)
    return (6 * val) ** .5

使用全局指令（即# cython: profile=True）比修改全局Cython状态更好，因为更改它会导致扩展被重新编译（如果更改了全局Cython状态，情况就不会这样-使用旧的全局状态编译的旧缓存版本将被重新加载/重用。

现在

%prun -s cumulative approx_pi(1000000)

产量：

        1000005 function calls in 1.860 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    1.860    1.860 {built-in method builtins.exec}
        1    0.000    0.000    1.860    1.860 <string>:1(<module>)
        1    0.000    0.000    1.860    1.860 {_cython_magic_404d18ea6452e5ffa4c993f6a6e15b22.approx_pi}
        1    0.612    0.612    1.860    1.860 _cython_magic_404d18ea6452e5ffa4c993f6a6e15b22.pyx:7(approx_pi)
  1000000    1.248    0.000    1.248    0.000 _cython_magic_404d18ea6452e5ffa4c993f6a6e15b22.pyx:4(recip_square)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

%lprun-魔术

如果应使用行探查器（即%prun-magic），则应使用不同的指令来编译Cython模块：

%%cython
# cython: linetrace=True
# cython: binding=True
# distutils: define_macros=CYTHON_TRACE_NOGIL=1
...

linetrace=True触发在生成的C代码中创建跟踪，并暗示profile=True，因此不能另外设置它。如果没有binding=True，line_profiler将没有必要的代码信息，并且需要CYTHON_TRACE_NOGIL=1，因此在使用C编译器进行编译时，行分析也会被激活（并且不会被C预处理器丢弃）。如果不应按行对nogil块进行概要分析，也可以使用CYTHON_TRACE=1。

现在可以例如以如下方式使用它，传递函数，这些函数应该通过-f选项进行行配置（使用%lprun?获取有关可能选项的信息）：

%load_ext line_profiler
%lprun -f approx_pi -f recip_square approx_pi(1000000)

产生：

Timer unit: 1e-06 s

Total time: 1.9098 s
File: /XXXX.pyx
Function: recip_square at line 5

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     5                                           def recip_square(i):
     6   1000000    1909802.0      1.9    100.0      return 1. / i ** 2

Total time: 6.54676 s
File: /XXXX.pyx
Function: approx_pi at line 8

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     8                                           def approx_pi(n=10000000):
     9         1          3.0      3.0      0.0      val = 0.
    10   1000001    1155778.0      1.2     17.7      for k in range(1, n + 1):
    11   1000000    5390972.0      5.4     82.3          val += recip_square(k)
    12         1          9.0      9.0      0.0      return (6 * val) ** .5

line_profiler´ has however a minor hiccup with cpdef`-function：无法正确检测到函数主体。 %lprun-magic，显示了可能的解决方法。

应该知道，与“正常”运行相比，性能分析（所有行性能分析）都会更改执行时间及其分布。在这里，我们看到，对于相同的功能，根据配置文件的类型需要不同的时间：

Method (N=10^6):        Running Time:       Build with:
%timeit                 1 second
%prun                   2 seconds           profile=True
%lprun                  6.5 seconds         linetrace=True,binding=True,CYTHON_TRACE_NOGIL=1

如何逐行分析cython功能

3 个答案: