openACC编译代码,cuStreamSynchronize返回错误700

时间:2019-07-16 06:11:11

标签: openacc

我已经用简单的openACC派生程序编译了一个程序。编译正常,没有错误。但是,当我运行程序时,出现一个通用的“调用cuStreamSynchronize返回的错误700:内核执行期间的非法地址”错误。

我运行了cuda-memcheck并收到以下错误。有没有人可以帮助我确定问题?

========= CUDA-MEMCHECK
simpleGridingRatio: 300
========= Invalid __global__ read of size 4
=========     at 0x000007a8 in /home/forwardSolver/ChannelCppSolver.h:135:void linearDiscretization_135_gpu<double>(caseProp<double>&, std::vector<double, std::allocator<double>>&, std::vector<double, std::allocator<double>>&, std::vector<double, std::allocator<double>>&, std::vector<double, std::allocator<double>>&)
=========     by thread (0,0,0) in block (0,0,0)
=========     Address 0x7ffca4f9a7b0 is out of bounds
=========     Saved host backtrace up to driver entry point at kernel launch time
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so (cuLaunchKernel + 0x2fe) [0x28187e]
=========     Host Frame:/opt/pgi/linux86-64-llvm/19.4/lib/libaccn.so (__pgi_uacc_cuda_launch3 + 0x1d59) [0x1a64a]
=========     Host Frame:/opt/pgi/linux86-64-llvm/19.4/lib/libaccn.so [0x1b392]
=========     Host Frame:/opt/pgi/linux86-64-llvm/19.4/lib/libaccn.so (__pgi_uacc_cuda_launch + 0x13a) [0x1b4ce]
=========     Host Frame:/opt/pgi/linux86-64-llvm/19.4/lib/libaccg.so (__pgi_uacc_launch + 0x1ff) [0x18f92]
=========     Host Frame:./ChannelCppProposal [0x2ffd5]
=========     Host Frame:./ChannelCppProposal [0x2dfe4]
=========     Host Frame:./ChannelCppProposal [0x2dd77]
=========     Host Frame:./ChannelCppProposal [0x2fcc5]
=========     Host Frame:./ChannelCppProposal [0x2eaf7]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xe7) [0x21b97]
=========     Host Frame:./ChannelCppProposal [0x65fa]
=========
========= Program hit CUDA_ERROR_LAUNCH_FAILED (error 719) due to "unspecified launch failure" on CUDA API call to cuStreamSynchronize. 
=========     Saved host backtrace up to driver entry point at error
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so (cuStreamSynchronize + 0x165) [0x281355]
=========     Host Frame:/opt/pgi/linux86-64-llvm/19.4/lib/libaccn.so (__pgi_uacc_cuda_launch3 + 0x20c9) [0x1a9ba]
=========     Host Frame:/opt/pgi/linux86-64-llvm/19.4/lib/libaccn.so [0x1b392]
=========     Host Frame:/opt/pgi/linux86-64-llvm/19.4/lib/libaccn.so (__pgi_uacc_cuda_launch + 0x13a) [0x1b4ce]
=========     Host Frame:/opt/pgi/linux86-64-llvm/19.4/lib/libaccg.so (__pgi_uacc_launch + 0x1ff) [0x18f92]
=========     Host Frame:./ChannelCppProposal [0x2ffd5]
=========     Host Frame:./ChannelCppProposal [0x2dfe4]
=========     Host Frame:./ChannelCppProposal [0x2dd77]
=========     Host Frame:./ChannelCppProposal [0x2fcc5]
=========     Host Frame:./ChannelCppProposal [0x2eaf7]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xe7) [0x21b97]
=========     Host Frame:./ChannelCppProposal [0x65fa]
=========
Failing in Thread:1
========= Program hit CUDA_ERROR_LAUNCH_FAILED (error 719) due to "unspecified launch failure" on CUDA API call to cuCtxSynchronize. 
=========     Saved host backtrace up to driver entry point at error
call to cuStreamSynchronize returned error 719: Launch failed (often invalid pointer dereference)

=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so (cuCtxSynchronize + 0x152) [0x258c22]
=========     Host Frame:/opt/pgi/linux86-64-llvm/19.4/lib/libaccn.so (__pgi_uacc_cuda_error_handler + 0x258) [0xef30]
=========     Host Frame:/opt/pgi/linux86-64-llvm/19.4/lib/libaccn.so (__pgi_uacc_cuda_launch3 + 0x20ec) [0x1a9dd]
=========     Host Frame:/opt/pgi/linux86-64-llvm/19.4/lib/libaccn.so [0x1b392]
=========     Host Frame:/opt/pgi/linux86-64-llvm/19.4/lib/libaccn.so (__pgi_uacc_cuda_launch + 0x13a) [0x1b4ce]
=========     Host Frame:/opt/pgi/linux86-64-llvm/19.4/lib/libaccg.so (__pgi_uacc_launch + 0x1ff) [0x18f92]
=========     Host Frame:./ChannelCppProposal [0x2ffd5]
=========     Host Frame:./ChannelCppProposal [0x2dfe4]
=========     Host Frame:./ChannelCppProposal [0x2dd77]
=========     Host Frame:./ChannelCppProposal [0x2fcc5]
=========     Host Frame:./ChannelCppProposal [0x2eaf7]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xe7) [0x21b97]
=========     Host Frame:./ChannelCppProposal [0x65fa]
=========
========= ERROR SUMMARY: 3 errors

1 个答案:

答案 0 :(得分:0)

“内核执行期间的非法地址”类似于正在使用错误地址的主机上的分段违规(segv)。

虽然不能确定,但​​是“ Address 0x7ffca4f9a7b0”在我看来是主机地址。

同样从linearDiscretization_135_gpu的签名来看,您似乎在代码中使用向量。您如何管理这些向量的数据?向量是具有三个指针的不透明类。给定OpenACC数据区域执行浅表复制,如果在data子句中包含向量,则仅复制指针,而不复制它们指向的数据。因此,如果我对主机地址是正确的,则可能的原因是您正在复制向量,该向量复制了主机指针地址,这会导致设备上的非法地址错误。

对于矢量,您需要执行手动深层复制,或者如果您正在使用PGI,请尝试使用“ -ta = tesla:managed”进行编译,以便使用CUDA统一内存。这样,使用的Vector指针将是在主机和设备上均可访问的统一地址。

这是纯粹的猜测工作,因此您可能需要做更多调查。您可以尝试设置环境变量PGI_ACC_DEBUG = 1(对于PGI)或CRAY_ACC_DEBUG = 1(对于Cray)以使运行时打印详细信息。不确定GNU的OpenACC实现是否具有等效的env变量。

如果您需要更多的调查帮助,请提供一个小的复制示例,我们可以看看是否可以确定问题所在。