cuDevicePrimaryCtxRetain在acc_init之后返回CUDA_ERROR_INVALID_DEVICE

时间:2017-08-29 16:52:50

标签: cuda openacc pgi

我正在尝试使用玩具示例(见下面)的新PGI社区版本(17.4),并且在调用acc_init时我在CUDA驱动程序api中收到错误。

重现错误的代码是:

#include <openacc.h>
#include <cuda_runtime_api.h>
#include <stdio.h>

int main()
{
   acc_init( acc_device_nvidia );

   int ndev = acc_get_num_devices( acc_device_nvidia );

   printf("Num OpenACC devices: %d\n", ndev);

   cudaGetDeviceCount(&ndev);

   printf("Num CUDA devices: %d\n", ndev);

   return 0;
}

编译: /usr/local/pgi/linux86-64/17.4/bin/pgcc -acc -ta=tesla -Mcuda ./test.c -o oacc_test.pgi

cuda memcheck输出:

$ cuda-memcheck ./oacc_test.pgi 
========= CUDA-MEMCHECK
========= Program hit CUDA_ERROR_INVALID_DEVICE (error 101) due to "invalid device ordinal" on CUDA API call to cuDevicePrimaryCtxRetain. 
=========     Saved host backtrace up to driver entry point at error
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so (cuDevicePrimaryCtxRetain + 0x15c) [0x1e8d1c]
=========     Host Frame:/usr/local/pgi/linux86-64/17.4/lib/libaccnc.so (__pgi_uacc_cuda_initdev + 0x80b) [0x6f0b]
=========     Host Frame:/usr/local/pgi/linux86-64/17.4/lib/libaccg.so (__pgi_uacc_enumerate + 0x148) [0x11388]
=========     Host Frame:/usr/local/pgi/linux86-64/17.4/lib/libaccg.so (__pgi_uacc_initialize + 0x5b) [0x117ab]
=========     Host Frame:/usr/local/pgi/linux86-64/17.4/lib/libaccapi.so (acc_init + 0x22) [0xe4f2]
=========     Host Frame:./oacc_test.pgi [0xbc4]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xf1) [0x202b1]
=========     Host Frame:./oacc_test.pgi [0xaca]
=========
Num OpenACC devices: 1
Num CUDA devices: 1
========= ERROR SUMMARY: 1 error

显然__pgi_uacc_cuda_initdev传递'-1'作为第二个参数(CUdevice dev)到cuDevicePrimaryCtxRetain(bug?):

Breakpoint 1, 0x00007ffff4ab0bc0 in cuDevicePrimaryCtxRetain () from /usr/lib/x86_64-linux-gnu/libcuda.so
(cuda-gdb) p /x $rsi
$7 = 0xffffffff

我想这不正常。这是17.4的错误还是我的安装坏了?

1 个答案:

答案 0 :(得分:3)

这是正常的,也是一个良性错误。基本上正在发生的是PGI运行时正在查询是否已经创建了CUDA上下文。但由于没有CUDA运行时调用只是查询上下文的存在,我们称之为“cuDevicePrimaryCtxRetain”。如果它出错,那么我们就知道我们需要创建一个新的上下文。

请注意,在PGI版本17.7中,我们确实更改了此调用,因此在运行cuda-memcheck时您将不再看到错误。