我在使用内存检查程序调试thrust::sort_by_key
时遇到问题。
我正在使用MS Windows 7,MS VS 2010,显卡 - GeForce 840M。
要重新启动,请在debug32中运行此命令并启用内存检查:
#include "cuda_runtime.h"
#include "device_launch_parameters.h"
#include "thrust\sort.h"
#include "thrust\device_ptr.h"
void sortCu(const int *a, const int *b, unsigned int size) {
int *dev_a = 0;
int *dev_b = 0;
cudaSetDevice(0);
cudaMalloc((void**)&dev_a, size * sizeof(int));
cudaMalloc((void**)&dev_b, size * sizeof(int));
cudaMemcpy(dev_a, a, size * sizeof(int), cudaMemcpyHostToDevice);
cudaMemcpy(dev_b, b, size * sizeof(int), cudaMemcpyHostToDevice);
thrust::device_ptr<int> d_aRob(dev_a);
thrust::device_ptr<int> d_bRob(dev_b);
thrust::sort_by_key(d_aRob, d_aRob + size, d_bRob);
}
int main() {
const int h_keys[5] = { 0,0,0,0,0};
const int h_vals[5] = { 1, 2, 3, 4, 5 };
sortCu(h_keys, h_vals, 5);
return 0;
}
首先,我得到了这个:
CUDA context created : 034e6a38
CUDA module loaded: 05b2b290 F:/DX/cuTest/cuTest/kernel.cu
CUDA context created : 036a6a38
CUDA module loaded: 059ab280 F:/DX/cuTest/cuTest/kernel.cu
Internal debugger error occurred while attempting to launch _ZN6thrust6system4cuda6detail6detail11b40c_thrust15RakingReductionIjiLi0ELi4ELi0ENS4_20PreprocessKeyFunctorIiEEEEvPbPiPT_SB_NS4_16CtaDecompositionE in CUcontext 0x036a6a38, CUmodule 0x059ab280:
code patching failed due to lack of code patching memory.
Please increase Nsight|Options|CUDA|Code Patching Memory and try again.
All breakpoints for function _ZN6thrust6system4cuda6detail6detail11b40c_thrust15RakingReductionIjiLi0ELi4ELi0ENS4_20PreprocessKeyFunctorIiEEEEvPbPiPT_SB_NS4_16CtaDecompositionE have been removed.
See Output View for additional messages of this type.
CUDA Debugger detected HW exception on 1 warps. First warp:
blockIdx = {0,0,0}
threadIdx = {64,0,0}
Exception = Out of range Register
PC = 0x001d4c08
FunctionRelativePC = _ZN6thrust6system4cuda6detail6detail11b40c_thrust13SrtsScanSpineIvEEvPiS6_i+000648
所以,我增加了
Nsight|Options|CUDA|Code Patching
到1000
然后我就明白了:
CUDA context created : 03586a38
CUDA module loaded: 05abb290 F:/DX/cuTest/cuTest/kernel.cu
CUDA Debugger detected HW exception on 1 warps. First warp:
blockIdx = {0,0,0}
threadIdx = {96,0,0}
Exception = Out of range Register
PC = 0x001d4c08
FunctionRelativePC = _ZN6thrust6system4cuda6detail6detail11b40c_thrust13SrtsScanSpineIvEEvPiS6_i+000648
如果禁用内存检查,一切正常。
那么,如何避免这种情况?
我不想调试内存问题,显然,我想在排序后调试代码(这是简化示例。在实际项目中我排序1m数组,然后调用复杂内核我想要调试)。