NumPy如何使用ufunc缓冲区来加速求和计算?

时间:2017-10-16 00:54:28

标签: python numpy f2py

NumPy为ufuncs使用缓冲区:

  

Internally, buffers are used for misaligned data, swapped data, and data that has to be converted from one data type to another. The size of internal buffers is settable on a per-thread basis.

显然,此缓冲区大小参数会对速度产生重大影响,例如,

In [1]: x = np.ones(int(1e7))

In [2]: timeit x.sum()
100 loops, best of 3: 4.88 ms per loop

In [3]: np.getbufsize()
Out[3]: 8192

In [4]: np.setbufsize(64)
Out[4]: 8192

In [5]: timeit x.sum()
10 loops, best of 3: 18 ms per loop

此外,例如,如果我编写Fortran函数:

function compute_sum(n, x) result(s)
    !f2py integer, intent(hide), depend(x) :: n = shape(x,0)
    !f2py double precision, intent(in)     :: x(n)
    !f2py double precision, intent(out)    :: s

    integer n
    double precision x(n)
    double precision s

    do i=1,n
        s = s + x(i)
    end do
end function

并使用f2py进行编译:

f2py -c mysum.f90 -m mysum

然后Fortran代码的速度是NumPy sum的两倍:

In [1]: import mysum

In [2]: x = np.ones(int(1e7))

In [3]: timeit mysum.compute_sum(x)
100 loops, best of 3: 10.2 ms per loop

我想在Fortran代码之间来回传递时会有一点开销(虽然数组没有被复制),但肯定不会超过5毫秒!

两个问题:

  • NumPy文档究竟是什么意思“缓冲区用于未对齐的数据等......”这又如何影响sum的计算方式?
  • 对于给定的缓冲区大小,类似的(即,运行NumPy sum)Fortran代码的速度几乎一样吗?

0 个答案:

没有答案