TEST SCRIPT

Question

这里有一篇帖子：https://gist.github.com/JonathanRaiman/f2ce5331750da7b2d4e9通过调用Fortran库（BLAS / LAPACK /英特尔MKL / OpenBLAS /无论你用NumPy安装的任何东西）显示出极大的速度提升。经过几个小时的研究（因为SciPy库已被弃用），我终于得到它编译没有结果。它比NumPy快2倍。不幸的是，正如另一位用户指出的那样，Fortran例程总是将输出矩阵添加到计算的新结果中，因此它仅在第一次运行时匹配NumPy。即A := alpha*x*y.T + A。因此，快速解决方案仍有待解决。

[更新：因为您希望使用SCIPY接口，请转到此处https://github.com/scipy/scipy/blob/master/scipy/linalg/cython_blas.pyx因为他们已经优化了对CPDEF声明中的BLAS / LAPACK的调用，只是复制/粘贴到您的CYTHON SCRIPT {{ 1}}同样在上面的链接cython_lapack.pyx可用，但没有Cython测试脚本]

TEST SCRIPT

# Python-accessible wrappers for testing:

#END TEST SCRIPT

编译cyblas.pyx的PYX文件（基本上是np.ndarray版本）

import numpy as np;
from cyblas import outer_prod;
a=np.random.randint(0,100, 1000);
b=np.random.randint(0,100, 1000);
a=a.astype(np.float64)
b=b.astype(np.float64)
cy_outer=np.zeros((a.shape[0],b.shape[0]));
np_outer=np.zeros((a.shape[0],b.shape[0]));

%timeit outer_prod(a,b,cy_outer)
#%timeit outer_prod(a,b) #use with fixed version instead of above line, results will automatically update cy_outer
%timeit np.outer(a,b, np_outer)
100 loops, best of 3: 2.83 ms per loop
100 loops, best of 3: 6.58 ms per loop

非常感谢。希望这可以节省其他人一些时间（它几乎可以工作） - 实际上我评论它工作1x并匹配NumPy然后每个后续调用添加到结果矩阵AGAIN。如果我将输出矩阵重置为0并重新运行结果，则匹配NumPy。奇怪......虽然如果一个取消注释上面的几行，它将工作，虽然只有NumPy速度。已找到替代方案import cython import numpy as np cimport numpy as np from cpython cimport PyCapsule_GetPointer cimport scipy.linalg.cython_blas cimport scipy.linalg.cython_lapack import scipy.linalg as LA REAL = np.float64 ctypedef np.float64_t REAL_t ctypedef np.uint64_t INT_t cdef int ONE = 1 cdef REAL_t ONEF = <REAL_t>1.0 ctypedef void (*dger_ptr) (const int *M, const int *N, const double *alpha, const double *X, const int *incX, double *Y, const int *incY, double *A, const int * LDA) nogil cdef dger_ptr dger=<dger_ptr>PyCapsule_GetPointer(LA.blas.dger._cpointer, NULL) # A := alpha*x*y.T + A cpdef outer_prod(_x, _y, _output): #cpdef outer_prod(_x, _y): #comment above line & use this to use the reset output matrix to zeros cdef REAL_t *x = <REAL_t *>(np.PyArray_DATA(_x)) cdef int M = _y.shape[0] cdef int N = _x.shape[0] #cdef np.ndarray[np.float64_t, ndim=2, order='c'] _output = np.zeros((M,N)) #slow fix to uncomment to reset output matrix to zeros cdef REAL_t *y = <REAL_t *>(np.PyArray_DATA(_y)) cdef REAL_t *output = <REAL_t *>(np.PyArray_DATA(_output)) with nogil: dger(&M, &N, &ONEF, y, &ONE, x, &ONE, output, &M)并将在另一篇帖子中...我还没弄清楚到底该如何调用它。

Answer 1

根据netlib A := alpha*x*y**T + A执行A。所以X应该全部为零，以获得Y和${activityid}的外部产品。

使用SciPy接口和Cython

TEST SCRIPT

编译cyblas.pyx的PYX文件（基本上是np.ndarray版本）

1 个答案: