我试图用C ++编写OpenCL包装器。 昨天我正在使用我的Windows 10机器(NVIDIA GTX970 Ti,我相信最新的NVIDIA GeForce驱动程序),我的代码完美无瑕。
今天,我在笔记本电脑上试用了它(Arch Linux,AMD Radeon R7 M265,Mesa 17.3.3),在尝试创建命令队列时遇到了段错误。
这是GDB的回溯:
#0 0x00007f361119db80 in ?? () from /usr/lib/libMesaOpenCL.so.1
#1 0x00007f36125dacb1 in clCreateCommandQueueWithProperties () from /usr/lib/libOpenCL.so.1
#2 0x0000557b2877dfec in OpenCL::createCommandQueue (ctx=..., dev=..., outOfOrderExec=false, profiling=false) at /home/***/OpenCL/Util.cpp:296
#3 0x0000557b2876f0cf in main (argc=1, argv=0x7ffd04fcdac8) at /home/***/main.cpp:27
#4 0x00007f361194cf4a in __libc_start_main () from /usr/lib/libc.so.6
#5 0x0000557b2876ecfa in _start ()
(我已经审查了部分路径) 以下是产生错误的代码:
CommandQueue createCommandQueue(Context ctx, Device dev, bool outOfOrderExec, bool profiling) noexcept
{
cl_command_queue_properties props [3]= {CL_QUEUE_PROPERTIES, 0, 0};
if (outOfOrderExec)
{
props[1] |= CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE;
}
if (profiling)
{
props[1] |= CL_QUEUE_PROFILING_ENABLE;
}
int error = CL_SUCCESS;
cl_command_queue queue = clCreateCommandQueueWithProperties(ctx.get(), dev.get(), props, &error);
if (error != CL_SUCCESS)
{
std::cerr << "Error while creating command queue: " << OpenCL::getErrorString(error) << std::endl;
}
CommandQueue commQueue = CommandQueue(queue);
Session::get().registerQueue(commQueue);
return commQueue;
}
clCreateCommandQueueWithProperties
行是发生段错误的地方。
Context
是cl_context
的包装类,Context::get()
返回原始cl_context:
class Context
{
private:
...
cl_context context;
public:
...
cl_context get() const noexcept;
...
};
Device
是cl_device
的包装器,Device::get()
也返回cl_device:
class Device
{
private:
...
cl_device_type type;
cl_device_id id;
public:
...
cl_device_id get() const noexcept;
cl_device_type getType () const noexcept;
...
};
这是主要功能:
int main (int argc, char* argv [])
{
OpenCL::Session::get().init();
for (const std::string& deviceAddress : OpenCL::Session::get().getAddresses())
{
std::cout << "[" << deviceAddress << "]: " << OpenCL::Session::get().getDevice(deviceAddress);
}
OpenCL::Context ctx = OpenCL::getContext();
std::cout << "OpenCL version: " << ctx.getVersionString() << std::endl;
OpenCL::Kernel kernel = OpenCL::createKernel(OpenCL::createProgram("src/Kernels/Hello.cl", ctx), "SAXPY");
OpenCL::CommandQueue queue = OpenCL::createCommandQueue(ctx, OpenCL::Session::get().getDevice(ctx.getAssociatedDevices()[0]));
unsigned int testDataSize = 1 << 13;
std::vector <float> a = std::vector <float> (testDataSize);
std::vector <float> b = std::vector <float> (testDataSize);
for (int i = 0; i < testDataSize; i++)
{
a[i] = static_cast<float>(i);
b[i] = 0.0;
}
OpenCL::Buffer aBuffer = OpenCL::allocateBuffer(ctx, a.data(), sizeof(float), a.size());
OpenCL::Buffer bBuffer = OpenCL::allocateBuffer(ctx, b.data(), sizeof(float), b.size());
kernel.setArgument(0, aBuffer);
kernel.setArgument(1, bBuffer);
kernel.setArgument(2, 2.0f);
OpenCL::Event saxpy_event = queue.enqueue(kernel, {testDataSize});
OpenCL::Event read_event = queue.read(bBuffer, b.data(), bBuffer.size());
std::cout << "SAXPY kernel took " << saxpy_event.getRunTime() << "ns to complete." << std::endl;
std::cout << "Read took " << read_event.getRunTime() << "ns to complete." << std::endl;
OpenCL::Session::get().cleanup();
return 0;
}
(该分析不会起作用,因为我已经禁用它(认为这是问题的原因),重新启用分析并不能解决问题)。
最后,这是程序的控制台输出:
/home/***/cmake-build-debug/Main
[gpu0:0]: AMD - AMD OLAND (DRM 2.50.0 / 4.14.15-1-ARCH, LLVM 5.0.1): 6 compute units @ 825MHz
OpenCL version: OpenCL 1.1 Mesa 17.3.3
Signal: SIGSEGV (Segmentation fault)
上下文和设备对象似乎都是在没有任何问题的情况下创建的,所以我真的不知道是什么造成了段错误。
我是否有可能在Mesa驱动程序中发现错误,或者我错过了一些明显的错误?
编辑:This人似乎遇到了类似的问题,不幸的是,他的问题只是一个C风格的忘记分配内存问题。
第二次编辑:我可能找到了这个问题的可能原因,CMake正在查找,使用和链接OpenCL 2.0,而我的GPU仅支持OpenCL 1.1。我会调查一下这个。 我还没有找到在Arch Linux上回滚到OpenCL 1.1的方法,但是clinfo似乎工作得很好,搅拌器也是如此(取决于OpenCL),所以我不认为这是问题所在。
这里是来自clinfo的输出:
Number of platforms 1
Platform Name Clover
Platform Vendor Mesa
Platform Version OpenCL 1.1 Mesa 17.3.3
Platform Profile FULL_PROFILE
Platform Extensions cl_khr_icd
Platform Extensions function suffix MESA
Platform Name Clover
Number of devices 1
Device Name AMD OLAND (DRM 2.50.0 / 4.14.15-1-ARCH, LLVM 5.0.1)
Device Vendor AMD
Device Vendor ID 0x1002
Device Version OpenCL 1.1 Mesa 17.3.3
Driver Version 17.3.3
Device OpenCL C Version OpenCL C 1.1
Device Type GPU
Device Available Yes
Device Profile FULL_PROFILE
Max compute units 6
Max clock frequency 825MHz
Max work item dimensions 3
Max work item sizes 256x256x256
Max work group size 256
Compiler Available Yes
Preferred work group size multiple 64
Preferred / native vector sizes
char 16 / 16
short 8 / 8
int 4 / 4
long 2 / 2
half 8 / 8 (cl_khr_fp16)
float 4 / 4
double 2 / 2 (cl_khr_fp64)
Half-precision Floating-point support (cl_khr_fp16)
Denormals No
Infinity and NANs Yes
Round to nearest Yes
Round to zero No
Round to infinity No
IEEE754-2008 fused multiply-add No
Support is emulated in software No
Single-precision Floating-point support (core)
Denormals No
Infinity and NANs Yes
Round to nearest Yes
Round to zero No
Round to infinity No
IEEE754-2008 fused multiply-add No
Support is emulated in software No
Correctly-rounded divide and sqrt operations No
Double-precision Floating-point support (cl_khr_fp64)
Denormals Yes
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Address bits 64, Little-Endian
Global memory size 2147483648 (2GiB)
Error Correction support No
Max memory allocation 1503238553 (1.4GiB)
Unified memory for Host and Device No
Minimum alignment for any data type 128 bytes
Alignment of base address 32768 bits (4096 bytes)
Global Memory cache type None
Image support No
Local memory type Local
Local memory size 32768 (32KiB)
Max constant buffer size 1503238553 (1.4GiB)
Max number of constant args 16
Max size of kernel argument 1024
Queue properties
Out-of-order execution No
Profiling Yes
Profiling timer resolution 0ns
Execution capabilities
Run OpenCL kernels Yes
Run native kernels No
Device Extensions cl_khr_byte_addressable_store cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_fp64 cl_khr_fp16
NULL platform behavior
clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...) Clover
clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...) Success [MESA]
clCreateContext(NULL, ...) [default] Success [MESA]
clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT) Success (1)
Platform Name Clover
Device Name AMD OLAND (DRM 2.50.0 / 4.14.15-1-ARCH, LLVM 5.0.1)
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU) Success (1)
Platform Name Clover
Device Name AMD OLAND (DRM 2.50.0 / 4.14.15-1-ARCH, LLVM 5.0.1)
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL) Success (1)
Platform Name Clover
Device Name AMD OLAND (DRM 2.50.0 / 4.14.15-1-ARCH, LLVM 5.0.1)
ICD loader properties
ICD loader Name OpenCL ICD Loader
ICD loader Vendor OCL Icd free software
ICD loader Version 2.2.12
ICD loader Profile OpenCL 2.2
第3次修改:我只是在我的NVIDIA机器上运行代码,没有问题,这就是控制台显示的内容:
[gpu0:0]: NVIDIA Corporation - GeForce GTX 970: 13 compute units @ 1253MHz
OpenCL version: OpenCL 1.2 CUDA 9.1.75
SAXPY kernel took 2368149686ns to complete.
Read took 2368158390ns to complete.
我还修复了Andreas提及的两件事
答案 0 :(得分:1)
clCreateCommandQueueWithProperties
。您不应将其用于低于2.0版的平台和设备(例如日志中显示的1.1和1.2)。
答案 1 :(得分:0)
clCreateCommandQueue
在OpenCL 1.2中已弃用
这意味着您可以在有属性或无属性的情况下使用它。