managedCUDA:运行自己的内核 - 错误:InvalidValue

时间:2014-09-01 11:11:44

标签: c# cuda

我是编程CUDA的新手,我尝试使用managedCUDA在C#中加载CUDA内核,但我在调用内核方法时仍然遇到错误。

ErrorInvalidValue: This indicates that one or more of the parameters passed to the API call is not within an acceptable range of values.

即使我调用kernel.Run()kernel.RunAsync((new CudaStream()).Stream) - 当我的内核没有任何参数时,我也会收到此错误。

有人知道什么是错的吗?或者任何人都可以向我指出正确的方向......对于任何帮助都有很多帮助!

我的 kernel.cu 代码是:

#include <stdio.h>
#include <cuda.h>
#include <cuda_runtime.h>
#include "device_launch_parameters.h"

extern "C"  {
    __global__ void func()
    {
        const int numThreads = blockDim.x * gridDim.x;
        const int threadID = blockIdx.x * blockDim.x + threadIdx.x;
        //do nothing
    }
}

kernel.ptx

    .version 1.4
    .target sm_11, map_f64_to_f32
    // compiled with C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v6.0\bin/../open64/lib//be.exe
    // nvopencc 4.1 built on 2014-03-14
    ...some more comments...

    .file   1   "<filename>.gpu"
     ...next 33 .file(s)

    .entry func
    {
    .loc    15  52  0
$LDWbegin_func:
    .loc    15  57  0
    exit;
$LDWend_func:
    } // func

和C#程序:

        string resName;
        if (IntPtr.Size == 8)
            resName = "kernel_64.ptx";
        else
            resName = "kernel.ptx";

        string resNamespace = "signalViewer.CUDA";
        string resource = resNamespace + "." + resName;
        Stream stream = Assembly.GetExecutingAssembly().GetManifestResourceStream(resource);
        if (stream == null) throw new ArgumentException("Kernel not found in resources.");

        CudaKernel func= ctx.LoadKernelPTX(stream, "func");

        dim3 threads = new dim3(512, 1);
        dim3 blocks = new dim3(N / (int)threads.x, 1);

        func.BlockDimensions = threads;
        func.GridDimensions = blocks;
        func.RunAsync((new CudaStream()).Stream);
        //func.Run();

1 个答案:

答案 0 :(得分:1)

感谢@Jez - 我的块数等于零

dim3 threads = new dim3(512, 1);
dim3 blocks = new dim3(N / (int)threads.x, 1); 

N为64

更好的解决方案是:

    int maxThreads = Math.Min(ctx.GetDeviceInfo().MaxThreadsPerBlock, N);
    dim3 threads = new dim3(maxThreads, 1);
    dim3 blocks = new dim3((N + maxThreads - 1)  / maxThreads, 1);