NSight调试任何内存分配都非常慢

时间:2019-06-29 16:11:33

标签: cuda nsight

我正在尝试调试使用某些表面对象和cuda数组的内核。为此,我将NVIDIA NSight与VS2017一起使用,并在下一代模式下进行调试。但是,要花很长时间(等待10分钟,然后取消运行)来运行#include <cuda_runtime.h> int main() { int width = 800; int height = 600; // Allocate CUDA arrays in device memory cudaChannelFormatDesc colorDescription = cudaCreateChannelDesc(8, 8, 8, 8, cudaChannelFormatKindUnsigned); cudaChannelFormatDesc depthDescription = cudaCreateChannelDesc(32, 0, 0, 0, cudaChannelFormatKindFloat); cudaArray *colorArray; cudaMallocArray(&colorArray, &colorDescription, width, height, cudaArraySurfaceLoadStore); cudaArray *depthArray; cudaMallocArray(&depthArray, &depthDescription, width, height, cudaArraySurfaceLoadStore); return 0; } 调用,这些调用是内核启动之前的必要调用。有什么办法解决吗?

以下实际上只分配分配的数组的最小代码,花了很长时间才能执行NSight调试。

main.cpp:

cudaMalloc

更新

我设法用普通的cudaMalloc替换了曲面-结果相同。


规格详细信息:

  • 系统:Windows 10 Pro,64位(v。1803,内部版本17134.829)
  • GPU:GeForce GTX 780(普通,无“ Ti”或任何东西)
  • GPU驱动程序430.86
  • CUDA v.10.1.168
  • Nsight v.2019.2.0.19109

编辑2:

在进一步调查中,我停止了我的简单程序(使用ntdll.dll!00007ffc9b69b1e4() Unknown kernel32.dll!00007ffc9aecb093() Unknown kernel32.dll!00007ffc9af096f5() Unknown nvcuda.dll!00007ffc4037a38c() Unknown nvcuda.dll!00007ffc4037a532() Unknown nvcuda.dll!00007ffc40379dae() Unknown nvcuda.dll!00007ffc40377b05() Unknown nvcuda.dll!00007ffc40374515() Unknown nvcuda.dll!00007ffc405cd13b() Unknown nvcuda.dll!00007ffc40442807() Unknown nvcuda.dll!00007ffc4054bb84() Unknown nvcuda.dll!00007ffc4055abed() Unknown nvcuda.dll!00007ffc4055aee6() Unknown nvcuda.dll!00007ffc4055a022() Unknown nvcuda.dll!00007ffc4054b163() Unknown nvcuda.dll!00007ffc4040b4c7() Unknown nvcuda.dll!00007ffc4040ea85() Unknown nvcuda.dll!00007ffc4030588c() Unknown nvcuda.dll!00007ffc4049a3e8() Unknown NSightSlow.exe!cudart::contextStateManager::initPrimaryContext(struct cudart::device *) C++ NSightSlow.exe!cudart::contextStateManager::tryInitPrimaryContext(struct cudart::device *) C++ NSightSlow.exe!cudart::contextStateManager::initDriverContext(void) C++ NSightSlow.exe!cudart::contextStateManager::getRuntimeContextState(class cudart::contextState * *,bool) C++ NSightSlow.exe!cudart::doLazyInitContextState(void) C++ NSightSlow.exe!cudart::cudaApiMalloc(void * *,unsigned __int64) C++ NSightSlow.exe!cudaMalloc() C++ > NSightSlow.exe!main() Line 10 C++ NSightSlow.exe!invoke_main() Line 79 C++ NSightSlow.exe!__scrt_common_main_seh() Line 288 C++ NSightSlow.exe!__scrt_common_main() Line 331 C++ NSightSlow.exe!mainCRTStartup() Line 17 C++ kernel32.dll!00007ffc9aec4034() Unknown ntdll.dll!00007ffc9b6d3691() Unknown 而不是数组的执行),调用堆栈显示如下:

cudaMalloc

似乎与第一个CUDA函数调用上的(惰性)上下文创建有关。它可能与=IF(COUNTIF(Holiday,E3),"Other",IF(WEEKDAY(E3)=5,"Remote",IF(WEEKDAY(E3,2)>5,"Sleep","Wake up"))) 无关。

1 个答案:

答案 0 :(得分:1)

我从NVidia雇主那里得到反馈,开普勒架构不支持Next-Gen,例如我的GTX780。应该有适当的错误消息,但是没有。

关于以下位置所支持的内容的详细列表:

https://developer.nvidia.com/nsight-visual-studio-edition-supported-gpus-full-list#SupportedComputeConfigs