我正在尝试调试使用某些表面对象和cuda数组的内核。为此,我将NVIDIA NSight与VS2017一起使用,并在下一代模式下进行调试。但是,要花很长时间(等待10分钟,然后取消运行)来运行#include <cuda_runtime.h>
int main() {
int width = 800;
int height = 600;
// Allocate CUDA arrays in device memory
cudaChannelFormatDesc colorDescription = cudaCreateChannelDesc(8, 8, 8, 8, cudaChannelFormatKindUnsigned);
cudaChannelFormatDesc depthDescription = cudaCreateChannelDesc(32, 0, 0, 0, cudaChannelFormatKindFloat);
cudaArray *colorArray;
cudaMallocArray(&colorArray, &colorDescription, width, height, cudaArraySurfaceLoadStore);
cudaArray *depthArray;
cudaMallocArray(&depthArray, &depthDescription, width, height, cudaArraySurfaceLoadStore);
return 0;
}
调用,这些调用是内核启动之前的必要调用。有什么办法解决吗?
以下实际上只分配分配的数组的最小代码,花了很长时间才能执行NSight调试。
main.cpp:
cudaMalloc
更新:
我设法用普通的cudaMalloc
替换了曲面-结果相同。
规格详细信息:
编辑2:
在进一步调查中,我停止了我的简单程序(使用ntdll.dll!00007ffc9b69b1e4() Unknown
kernel32.dll!00007ffc9aecb093() Unknown
kernel32.dll!00007ffc9af096f5() Unknown
nvcuda.dll!00007ffc4037a38c() Unknown
nvcuda.dll!00007ffc4037a532() Unknown
nvcuda.dll!00007ffc40379dae() Unknown
nvcuda.dll!00007ffc40377b05() Unknown
nvcuda.dll!00007ffc40374515() Unknown
nvcuda.dll!00007ffc405cd13b() Unknown
nvcuda.dll!00007ffc40442807() Unknown
nvcuda.dll!00007ffc4054bb84() Unknown
nvcuda.dll!00007ffc4055abed() Unknown
nvcuda.dll!00007ffc4055aee6() Unknown
nvcuda.dll!00007ffc4055a022() Unknown
nvcuda.dll!00007ffc4054b163() Unknown
nvcuda.dll!00007ffc4040b4c7() Unknown
nvcuda.dll!00007ffc4040ea85() Unknown
nvcuda.dll!00007ffc4030588c() Unknown
nvcuda.dll!00007ffc4049a3e8() Unknown
NSightSlow.exe!cudart::contextStateManager::initPrimaryContext(struct cudart::device *) C++
NSightSlow.exe!cudart::contextStateManager::tryInitPrimaryContext(struct cudart::device *) C++
NSightSlow.exe!cudart::contextStateManager::initDriverContext(void) C++
NSightSlow.exe!cudart::contextStateManager::getRuntimeContextState(class cudart::contextState * *,bool) C++
NSightSlow.exe!cudart::doLazyInitContextState(void) C++
NSightSlow.exe!cudart::cudaApiMalloc(void * *,unsigned __int64) C++
NSightSlow.exe!cudaMalloc() C++
> NSightSlow.exe!main() Line 10 C++
NSightSlow.exe!invoke_main() Line 79 C++
NSightSlow.exe!__scrt_common_main_seh() Line 288 C++
NSightSlow.exe!__scrt_common_main() Line 331 C++
NSightSlow.exe!mainCRTStartup() Line 17 C++
kernel32.dll!00007ffc9aec4034() Unknown
ntdll.dll!00007ffc9b6d3691() Unknown
而不是数组的执行),调用堆栈显示如下:
cudaMalloc
似乎与第一个CUDA函数调用上的(惰性)上下文创建有关。它可能与=IF(COUNTIF(Holiday,E3),"Other",IF(WEEKDAY(E3)=5,"Remote",IF(WEEKDAY(E3,2)>5,"Sleep","Wake up")))
无关。
答案 0 :(得分:1)
我从NVidia雇主那里得到反馈,开普勒架构不支持Next-Gen,例如我的GTX780。应该有适当的错误消息,但是没有。
关于以下位置所支持的内容的详细列表: