当启用内存检查程序时,thrust :: sort_by_key会在调试时生成错误

时间:2015-04-18 11:16:34

标签: debugging cuda thrust

我在使用内存检查程序调试thrust::sort_by_key时遇到问题。

我正在使用MS Windows 7,MS VS 2010,显卡 - GeForce 840M。

要重新启动,请在debug32中运行此命令并启用内存检查:

#include "cuda_runtime.h"
#include "device_launch_parameters.h"
#include "thrust\sort.h"
#include "thrust\device_ptr.h"

void sortCu(const int *a, const int *b, unsigned int size) {
    int *dev_a = 0;
    int *dev_b = 0;

    cudaSetDevice(0);

    cudaMalloc((void**)&dev_a, size * sizeof(int));
    cudaMalloc((void**)&dev_b, size * sizeof(int));
    cudaMemcpy(dev_a, a, size * sizeof(int), cudaMemcpyHostToDevice);
    cudaMemcpy(dev_b, b, size * sizeof(int), cudaMemcpyHostToDevice);

    thrust::device_ptr<int> d_aRob(dev_a);
    thrust::device_ptr<int> d_bRob(dev_b);
    thrust::sort_by_key(d_aRob, d_aRob + size, d_bRob);
}

int main() {
    const int h_keys[5] = { 0,0,0,0,0};
    const int h_vals[5] = { 1, 2, 3, 4, 5 };
    sortCu(h_keys, h_vals, 5);

    return 0;
}

首先,我得到了这个:

CUDA context created : 034e6a38
CUDA module loaded:   05b2b290 F:/DX/cuTest/cuTest/kernel.cu
CUDA context created : 036a6a38
CUDA module loaded:   059ab280 F:/DX/cuTest/cuTest/kernel.cu
Internal debugger error occurred while attempting to launch _ZN6thrust6system4cuda6detail6detail11b40c_thrust15RakingReductionIjiLi0ELi4ELi0ENS4_20PreprocessKeyFunctorIiEEEEvPbPiPT_SB_NS4_16CtaDecompositionE in CUcontext 0x036a6a38, CUmodule 0x059ab280:
code patching failed due to lack of code patching memory.
Please increase Nsight|Options|CUDA|Code Patching Memory and try again.
All breakpoints for function _ZN6thrust6system4cuda6detail6detail11b40c_thrust15RakingReductionIjiLi0ELi4ELi0ENS4_20PreprocessKeyFunctorIiEEEEvPbPiPT_SB_NS4_16CtaDecompositionE have been removed.
See Output View for additional messages of this type.
CUDA Debugger detected HW exception on 1 warps.  First warp:
blockIdx = {0,0,0}
threadIdx = {64,0,0}
Exception = Out of range Register
PC = 0x001d4c08
FunctionRelativePC = _ZN6thrust6system4cuda6detail6detail11b40c_thrust13SrtsScanSpineIvEEvPiS6_i+000648

所以,我增加了 Nsight|Options|CUDA|Code Patching到1000 然后我就明白了:

CUDA context created : 03586a38
CUDA module loaded:   05abb290 F:/DX/cuTest/cuTest/kernel.cu
CUDA Debugger detected HW exception on 1 warps.  First warp:
blockIdx = {0,0,0}
threadIdx = {96,0,0}
Exception = Out of range Register
PC = 0x001d4c08
FunctionRelativePC = _ZN6thrust6system4cuda6detail6detail11b40c_thrust13SrtsScanSpineIvEEvPiS6_i+000648

如果禁用内存检查,一切正常。

那么,如何避免这种情况?

我不想调试内存问题,显然,我想在排序后调试代码(这是简化示例。在实际项目中我排序1m数组,然后调用复杂内核我想要调试)。

0 个答案:

没有答案