NumPy为ufuncs使用缓冲区:
显然,此缓冲区大小参数会对速度产生重大影响,例如,
In [1]: x = np.ones(int(1e7))
In [2]: timeit x.sum()
100 loops, best of 3: 4.88 ms per loop
In [3]: np.getbufsize()
Out[3]: 8192
In [4]: np.setbufsize(64)
Out[4]: 8192
In [5]: timeit x.sum()
10 loops, best of 3: 18 ms per loop
此外,例如,如果我编写Fortran函数:
function compute_sum(n, x) result(s)
!f2py integer, intent(hide), depend(x) :: n = shape(x,0)
!f2py double precision, intent(in) :: x(n)
!f2py double precision, intent(out) :: s
integer n
double precision x(n)
double precision s
do i=1,n
s = s + x(i)
end do
end function
并使用f2py
进行编译:
f2py -c mysum.f90 -m mysum
然后Fortran代码的速度是NumPy sum
的两倍:
In [1]: import mysum
In [2]: x = np.ones(int(1e7))
In [3]: timeit mysum.compute_sum(x)
100 loops, best of 3: 10.2 ms per loop
我想在Fortran代码之间来回传递时会有一点开销(虽然数组没有被复制),但肯定不会超过5毫秒!
两个问题:
sum
的计算方式?sum
)Fortran代码的速度几乎一样吗?