Julia CUDArt - Compile ptx Module Using extern Function

时间:2016-07-11 21:10:48

标签: cuda julia julia-gpu

I am trying to create a ptx module to implement a CUBLAS function in order to answer this currently unresolved SO question. I want to be able to define a function that can then be executed using launch() or some similar utility.

As a base guide, I am looking at this page which gives examples from a script calling on the CUBLAS functions. I am also looking at this example from the CUDArt GitHub site for further insights. My current working effort looks something like this:

#include <cublas_v2.h>
extern "C" 
// Multiply the arrays A and B on GPU and save the result in C
// C(m,n) = A(m,k) * B(k,n)
__device__ void gpu_blas_mmul(const float *A, const float *B, float *C, const int m, const int k, const int n) {
    int lda=m,ldb=k,ldc=m;
    const float alf = 1;
    const float bet = 0;
    const float *alpha = &alf;
    const float *beta  = &bet;

    // Create a handle for CUBLAS
    cublasHandle_t handle;
    cublasCreate(&handle);

    // Do the actual multiplication
    cublasSgemm(handle, CUBLAS_OP_N, CUBLAS_OP_N, m, n, k, alpha, A, lda, B, ldb, beta, C, ldc);

    // Destroy the handle
    cublasDestroy(handle);
}

I am then compiling and then checking the compilation using something like this:

nvcc -ptx -gencode=arch=compute_35,code=sm_35 -lcublas gpu_blas_mmul.cu
ptxas -arch=sm_35 gpu_blas_mmul.ptx

When I do this, get the following error:

Unresolved extern function 'cublasCreate_v2'

If I remove the __device__ from the beginning of the script, I no longer get the error. But, when I then try to load the function in CUDArt using:

using CUDArt
md = CuModule("path/to/gpu_blas_mmul.ptx", false)
gpu_blas_mmul = CuFunction(md, "gpu_blas_mmul")

I get the error:

ERROR: Named symbol not found

I've looked through this and this SO post, as well this resource. I've tried simple solutions in them such as using __deice__ __host__ in my script and -nc when I compile with nvcc. I haven't dug into the articles in depth, however. In part, this is because they seem to be describing considerably more complex situations, using multiple scripts tied together. This seems more complex than I would think necessary, and furthermore, I'm not even particularly certain if it would be successful.

I've written other kernels and successfully compiled and launched them with CUDArt before, but this one that uses the cuBLAS library seems to be defeating me.

How can I resolve these issues so that I can compile and then launch the function specified in my script? I'm not clear which part of this process is breaking down or even whether I am approaching this problem at all in an appropriate way.

Notes: I've also tried replacing __device__ with __global__, and compiling using all the different architecture options, and neither of those resolve the issue.

0 个答案:

没有答案