我正在启动我的内核并检查可能的错误,如下所示:
kernel<<<grid,block>>>(d_Basis, d_repul_aux,nao);
cout<<"done with the ERIs...."<<endl;
std::string error = cudaGetErrorString(cudaPeekAtLastError());
cout<<error<<endl;
HANDLE_ERROR(cudaMemcpy(eris_gpu_cpu_aux.data(),d_repulsion_aux,eris_size*sizeof(double),cudaMemcpyDeviceToHost));
其中使用cudaGetErrorString(cudaPeekAtLastError())来对内核进行错误检查,我已定义:
static void HandleError( cudaError_t err,
const char *file,
int line ) {
if (err != cudaSuccess) {
printf( "%s in %s at line %d\n", cudaGetErrorString( err ),
file, line );
exit( EXIT_FAILURE );
}
}
#define HANDLE_ERROR( err ) (HandleError( err, __FILE__, __LINE__ ))
当X服务器关闭时,计算按指定运行;但如果我打开X服务器,内核就会挂起,我得到以下输出:
done with the ERIs....
no error
the launch timed out and was terminated in main.cu at line 1038
源代码中的第1038行对应于:
了handle_error(cudaMemcpy(eris_gpu_cpu_aux.data(),d_repulsion_aux,eris_size *的sizeof(双),cudaMemcpyDeviceToHost));
当我们将结果从设备复制到主机时,计算崩溃是什么意思。我使用的是显卡GEforce GTx-480和CUDA 7.5。
尝试解决问题,我尝试关闭/etc/X11/xorg.conf文件中的“交互”选项,但X服务器无法识别此选项。为了在X Server和GPGPU应用程序之间共享GPU资源,我该怎么办?我坚持这一点,因为我无法使用文本模式环境编写和/或调试我的代码。
答案 0 :(得分:1)
我以前的/etc/X11/xorg.conf文件如下:
# nvidia-xconfig: X configuration file generated by nvidia-xconfig
# nvidia-xconfig: version 319.21 (buildmeister@swio-display-x86-rhel47-14) Sun May 12 00:46:48 PDT 2013
Section "ServerLayout"
Identifier "Layout0"
Screen 0 "Screen0" 0 0
InputDevice "Keyboard0" "CoreKeyboard"
InputDevice "Mouse0" "CorePointer"
EndSection
Section "Files"
EndSection
Section "InputDevice"
# generated from default
Identifier "Mouse0"
Driver "mouse"
Option "Protocol" "auto"
Option "Device" "/dev/psaux"
Option "Emulate3Buttons" "no"
Option "ZAxisMapping" "4 5"
EndSection
Section "InputDevice"
# generated from default
Identifier "Keyboard0"
Driver "kbd"
EndSection
Section "Monitor"
Identifier "Monitor0"
VendorName "Unknown"
ModelName "Unknown"
HorizSync 28.0 - 33.0
VertRefresh 43.0 - 72.0
Option "DPMS"
EndSection
Section "Device"
Identifier "Device0"
Driver "nvidia"
VendorName "NVIDIA Corporation"
EndSection
Section "Screen"
Identifier "Screen0"
Device "Device0"
Monitor "Monitor0"
DefaultDepth 24
SubSection "Display"
Depth 24
EndSubSection
EndSection
为了解决这个问题,我们必须按如下方式禁用看门狗超时:
# nvidia-xconfig: X configuration file generated by nvidia-xconfig
# nvidia-xconfig: version 319.21 (buildmeister@swio-display-x86-rhel47-14) Sun May 12 00:46:48 PDT 2013
Section "ServerLayout"
Identifier "Layout0"
Screen 0 "Screen0" 0 0
InputDevice "Keyboard0" "CoreKeyboard"
InputDevice "Mouse0" "CorePointer"
EndSection
Section "Files"
EndSection
Section "InputDevice"
# generated from default
Identifier "Mouse0"
Driver "mouse"
Option "Protocol" "auto"
Option "Device" "/dev/psaux"
Option "Emulate3Buttons" "no"
Option "ZAxisMapping" "4 5"
EndSection
Section "InputDevice"
# generated from default
Identifier "Keyboard0"
Driver "kbd"
EndSection
Section "Monitor"
Identifier "Monitor0"
VendorName "Unknown"
ModelName "Unknown"
HorizSync 28.0 - 33.0
VertRefresh 43.0 - 72.0
Option "DPMS"
EndSection
Section "Device"
Identifier "Device0"
Driver "nvidia"
VendorName "NVIDIA Corporation"
##
## disable watchdog timeouts for long-running CUDA kernels
##
Option "Interactive" "false"
EndSection
Section "Screen"
Identifier "Screen0"
Device "Device0"
Monitor "Monitor0"
DefaultDepth 24
SubSection "Display"
Depth 24
EndSubSection
EndSection