无法从静态初始化代码启动CUDA内核

时间:2014-07-21 15:43:52

标签: c++ cuda global-variables static-initialization

我有一个在其构造函数中调用内核的类,如下所示:

" ScalarField.h"

#include <iostream>

    void ERROR_CHECK(cudaError_t err,const char * msg) {
        if(err!=cudaSuccess) {
            std::cout << msg << " : " << cudaGetErrorString(err) << std::endl;
            std::exit(-1);
        }
    }

    class ScalarField {
    public:
        float* array;
        int dimension;

        ScalarField(int dim): dimension(dim) {
            std::cout << "Scalar Field" << std::endl;
            ERROR_CHECK(cudaMalloc(&array, dim*sizeof(float)),"cudaMalloc");
        }
    };

&#34; classA.h&#34;

#include "ScalarField.h"


static __global__ void KernelSetScalarField(ScalarField v) {
    int index = threadIdx.x + blockIdx.x * blockDim.x;
    if (index < v.dimension) v.array[index] = 0.0f;
}

class A {
public:
    ScalarField v;

    A(): v(ScalarField(3)) {
        std::cout << "Class A" << std::endl;
        KernelSetScalarField<<<1, 32>>>(v);
        ERROR_CHECK(cudaGetLastError(),"Kernel");
    }
};

&#34; main.cu&#34;

#include "classA.h"

A a_object;

int main() {
    std::cout << "Main" << std::endl;
    return 0;
}

如果我在main(A a_object;)上实例化这个类,我没有错误。但是,如果我在main之外实例化它,在定义它之后(class A {...} a_object;)我会得到一个&#34;无效的设备函数&#34;内核启动时出错。为什么会这样?

修改

更新了代码以提供更完整的示例。

编辑2

根据Raxvan的评论中的建议,我想说我在ScalarField构造函数中使用的dimensions变量也在main之外定义(在另一个类中),但在其他所有之前。这可能是解释吗?调试器虽然显示了dimensions的正确值。

1 个答案:

答案 0 :(得分:12)

简短版本:

class A在main之外实例化时出现问题的根本原因是,在{{1}的构造函数之前没有运行使用内核初始化CUDA运行时库所需的特定钩子例程正在被召唤。发生这种情况是因为无法保证在C ++执行模型中实例化和初始化静态对象的顺序。在初始化执行CUDA设置的全局范围对象之前,将实例化您的全局范围类。您的内核代码在调用之前永远不会被加载到上下文中,并且会导致运行时错误。

据我所知,这是CUDA运行时API的真正限制,而不是用户代码中容易修复的内容。在您的简单示例中,您可以通过调用class A或其中一个非基于符号的运行时API memset函数替换内核调用,它将起作用。此问题完全限于通过运行时API在运行时加载的用户内核或设备符号。因此,空的默认构造函数也可以解决您的问题。从设计的角度来看,我会非常怀疑在构造函数中调用内核的任何模式。为没有依赖默认构造函数或析构函数的类GPU设置/拆卸添加特定方法将是一个更清洁,更不容易出错的设计,恕我直言。

详细信息:

有一个内部生成的例程(cudaMemset)必须运行以在内核之前使用CUDA驱动程序API加载和注册任何运行时API程序的fatbin有效内容中包含的内核,纹理和静态定义的设备符号可以无误地调用。这是&#34; lazy&#34;的一部分。运行时API的上下文初始化功能。您可以按照以下方式自行确认:

这是您发布的修订示例的gdb跟踪。注意我在__cudaRegisterFatBinary中插入了一个断点,在调用静态__cudaRegisterFatBinary构造函数并且内核启动失败之前就没达到它:

A

这是相同的过程,这次talonmies@box:~$ gdb a.out GNU gdb (Ubuntu/Linaro 7.4-2012.04-0ubuntu2.1) 7.4-2012.04 Copyright (C) 2012 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-linux-gnu". For bug reporting instructions, please see: <http://bugs.launchpad.net/gdb-linaro/>... Reading symbols from /home/talonmies/a.out...done. (gdb) break '__cudaRegisterFatBinary' Breakpoint 1 at 0x403180 (gdb) run Starting program: /home/talonmies/a.out [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". Scalar Field [New Thread 0x7ffff5a63700 (LWP 10774)] Class A Kernel : invalid device function [Thread 0x7ffff5a63700 (LWP 10774) exited] [Inferior 1 (process 10771) exited with code 0377] A内的main实例化(保证在执行延迟设置的对象初始化后发生):

talonmies@box:~$ cat main.cu
#include "classA.h"


int main() {
    A a_object;
    std::cout << "Main" << std::endl;
    return 0;
}

talonmies@box:~$ nvcc --keep -arch=sm_30 -g main.cu
talonmies@box:~$ gdb a.out 
GNU gdb (Ubuntu/Linaro 7.4-2012.04-0ubuntu2.1) 7.4-2012.04
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
For bug reporting instructions, please see:
<http://bugs.launchpad.net/gdb-linaro/>...
Reading symbols from /home/talonmies/a.out...done.
(gdb) break '__cudaRegisterFatBinary' 
Breakpoint 1 at 0x403180
(gdb) run
Starting program: /home/talonmies/a.out 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Breakpoint 1, 0x0000000000403180 in __cudaRegisterFatBinary ()
(gdb) cont
Continuing.
Scalar Field
[New Thread 0x7ffff5a63700 (LWP 11084)]
Class A
Main
[Thread 0x7ffff5a63700 (LWP 11084) exited]
[Inferior 1 (process 11081) exited normally]

如果这对您来说确实是一个严重的问题,我建议您联系NVIDIA开发人员支持并提出错误报告。