我想知道GEMM Transpose是如何工作的。我有一个矩阵,我想繁殖,我想多个转置的样本矩阵。如A.T * A
我有类似的东西,
def bptrs(a):
return gpuarray.arange(a.ptr,a.ptr+a.shape[0]*a.strides[0],a.strides[0],dtype=ctypes.c_void_p)
handle=cublasCreate()
A=np.ones((s,3)).astype(np.float64)
B=A.T # transposed
m,k=A.shape
k,n=B.shape
a_gpu = gpuarray.to_gpu(A)
b_gpu = gpuarray.to_gpu(B) # I am guessing I need to do a copy since A.T is a view
c_gpu = gpuarray.empty((m,n), np.float64) #Not 100% sure if this is right. I want to get a view returned, so I can save on memory
alpha = np.float64(1.0)
beta = np.float64(0.0)
cublasDgemmBatched(handle, 't','n',
n, m, k, alpha,
b_arr.gpudata, m,
a_arr.gpudata, k,
beta, c_arr.gpudata, m, 1)
我正在使用Cublas 7
答案 0 :(得分:3)
要做A ^ T A使用cublas syrk
- 函数而不是gemm
。
如果您想了解gemm
,那么您必须仔细阅读文档。不要在python中执行转置,因为gemm
有参数可以动态执行。这样的事情可以让你得到你想要的东西:
s = ...
k = 3
handle=cublasCreate()
A = np.ones((s,k)).astype(np.float64)
a_gpu = gpuarray.to_gpu(A)
c_gpu = gpuarray.empty((k,k), np.float64)
alpha = np.float64(1.0)
beta = np.float64(0.0)
cublasDgemm(handle,
't', 'n', # A^T A
m=k, # number of rows of matrix op(A) and C.
n=k, # number of columns of matrix op(B) and C.
k=s, # number of columns of op(A) and rows of op(B).
alpha,
a_gpu.gpudata, s, # lda x m with lda>=max(1,k)
a_gpu.gpudata, k, # ldb x k with ldb>=max(1,n)
beta,
c_arr.gpudata, k, # ldc x n with ldc>=max(1,m)
)