这必须在这里处理类似的问题:Calling BLAS / LAPACK directly using the SciPy interface and Cython但是不同,因为我在这里使用SciPy示例中的实际代码_test_dgemm
:https://github.com/scipy/scipy/blob/master/scipy/linalg/cython_blas.pyx这是非常快的(输入矩阵输入时比numpy.dot
快5倍,否则快20倍。如果传递Mx1 1xN向量,则不会产生任何结果。它产生与numpy.dot
相同的值,并传递矩阵。我已经最小化了代码,因为为了清晰起见,没有发布任何答案。这是dgemm.pyx.
:
import numpy as np
cimport numpy as np
from scipy.linalg.cython_blas cimport dgemm
from cython cimport boundscheck
@boundscheck(False)
cpdef int fast_dgemm(double[:,::1] a, double[:,::1] b, double[:,::1] c, double alpha=1.0, double beta=0.0) nogil except -1:
cdef:
char *transa = 'n'
char *transb = 'n'
int m, n, k, lda, ldb, ldc
double *a0=&a[0,0]
double *b0=&b[0,0]
double *c0=&c[0,0]
ldb = (&a[1,0]) - a0 if a.shape[0] > 1 else 1
lda = (&b[1,0]) - b0 if b.shape[0] > 1 else 1
k = b.shape[0]
if k != a.shape[1]:
with gil:
raise ValueError("Shape mismatch in input arrays.")
m = b.shape[1]
n = a.shape[0]
if n != c.shape[0] or m != c.shape[1]:
with gil:
raise ValueError("Output array does not have the correct shape.")
ldc = (&c[1,0]) - c0 if c.shape[0] > 1 else 1
dgemm(transa, transb, &m, &n, &k, &alpha, b0, &lda, a0,
&ldb, &beta, c0, &ldc)
return 0
以下是一个示例测试脚本:
import numpy as np;
a=np.random.randn(1000);
b=np.random.randn(1000);
a.resize(len(a),1);
a=np.array(a, order='c');
b.resize(1,len(b));
b=np.array(b, order='c');
c = np.empty((a.shape[0],b.shape[1]), float, order='c');
from dgemm import _test_dgemm;
_test_dgemm(a,b,c);
如果您想在Windows上使用Python 3.5 x64进行播放,请setup.py
通过命令提示符键入python setup.py build_ext --inplace --compiler=msvc
from Cython.Distutils import build_ext
import numpy as np
import os
try:
from setuptools import setup
from setuptools import Extension
except ImportError:
from distutils.core import setup
from distutils.extension import Extension
module = 'dgemm'
ext_modules = [Extension(module, sources=[module + '.pyx'],
include_dirs=['C://Program Files (x86)//Windows Kits//10//Include//10.0.10240.0//ucrt','C://Program Files (x86)//Microsoft Visual Studio 14.0//VC//include','C://Program Files (x86)//Windows Kits//8.1//Include//shared'],
library_dirs=['C://Program Files (x86)//Windows Kits//8.1//bin//x64', 'C://Windows//System32', 'C://Program Files (x86)//Microsoft Visual Studio 14.0//VC//lib//amd64', 'C://Program Files (x86)//Windows Kits//8.1//Lib//winv6.3//um//x64', 'C://Program Files (x86)//Windows Kits//10//Lib//10.0.10240.0//ucrt//x64'],
extra_compile_args=['/Ot', '/favor:INTEL64', '/EHsc', '/GA'],
language='c++')]
setup(
name = module,
ext_modules = ext_modules,
cmdclass = {'build_ext': build_ext},
include_dirs = [np.get_include(), os.path.join(np.get_include(), 'numpy')]
)
非常感谢任何帮助!
答案 0 :(得分:2)
如果我看对了,你会尝试将fortran-routines用于带有c-memory-layout的数组。
即使你明显知道,我想首先详细说明行主要顺序(c-memory-layout)和列主要顺序(fortran-memory-layout),以便推断出我的答案。
因此,如果我们有一个SCREEN 12
CLS
PRINT ""
PRINT ""
PRINT ""
PRINT ""
PRINT ""
PRINT " POKELITE - By Mark "
PRINT ""
PRINT ""
INPUT "Join or Host a game? ", hostorjoin$
hostorjoin$ = UCASE$(hostorjoin$)
IF hostorjoin$ = "JOIN" THEN GOTO JOIN
IF hostorjoin$ = "HOST" THEN GOTO HOST
' If neither "HOST" nor "JOIN" is specified, what happens?
HOST:
server& = _OPENHOST("TCP/IP:300")
PRINT "Waiting for connection..."
PRINT "! Remember: If playing locally, give the other player your IPv4 Address !"
DO
connection& = _OPENCONNECTION(server&)
LOOP UNTIL connection& <> 0
PRINT ""
PRINT "2nd Player Joined!"
SLEEP 2
GOTO GAME
JOIN:
INPUT "Enter Server IPv4 Address (Example: 192.168.1.25): ", joinip$
connection& = _OPENCLIENT("TCP/IP:300:" + joinip$)
IF connection& = 0 THEN PRINT "Connection failed!": SLEEP 2: CLS: GOTO JOIN
GOTO GAME
GAME:
CLS
INPUT "Enter your name: ", playerName$
IF playerName$ = "" THEN GOTO GAME
PRINT "Waiting for other player..."
' Send name to opponent and wait for opponent's name.
PUT connection&, , playerName$
DO
GET connection&, , opponentName$
LOOP UNTIL opponentName$ <> ""
PRINT "You: "; playerName$
PRINT "Opponent:"; opponentName$
矩阵(即2行3列)2x3
,并将其存储在一些连续的内存中,我们得到:
A
这意味着如果我们得到一个连续的记忆,它代表一个行主要顺序的矩阵,并将其解释为列主要顺序的矩阵,我们将得到一个完全不同的矩阵!
但是,我们要看一下我们可以轻松看到的转置矩阵row-major-order(A) = A11, A12, A13, A21, A22, A23
col-major-order(A) = A11, A21, A12, A22, A13, A33
:
A^t
这意味着,如果我们想以行 - 主顺序得到矩阵row-major-order(A) = col-major-order(A^t)
col-major-order(A) = row-major-order(A^t)
作为结果,那么blas-routine应该按列主要顺序写入转置矩阵C
(之后)这一切我们无法改变)进入这个记忆。但是,C
和C^t=(AB)^t=B^t*A^t
和B^t
是按主列顺序重新解释的原始矩阵。
现在,让A^t
成为A
- 矩阵和n x k
B
- 矩阵,dgemm例程的调用应如下:< / p>
k x m
正如您所看到的,您在代码中切换了一些dgemm(transa, transb, &m, &n, &k, &alpha, b0, &m, a0, &k, &beta, c0, &m)
和n
。