我正在尝试使用 cudaMallocPitch()和 cudaMemcpy2D()分配和初始化2D数组。我已经能够使用以前的API分配几个数组,但是有一个特定的数组会导致我的程序出错。
我的代码是,
int size = totalPat * trainingSize * wordSize; // 65 * 672 * 15
char ** h_pattern = (char**) malloc((size_t) 40 * sizeof(char));
for(int = 0; i < 40; i++){
h_pattern[i] = (char*) malloc((size_t) size * sizeof(char));
fill_n(h_pattern[i], size, '\0');
}
char * d_pattern;
size_t dpitch;
size_t spitch = size * sizeof(char);
cudaMallocPitch(&d_patterns, &dpitch, spitch, 40));
cudaMemcpy2D(d_pattern, dpitch, h_pattern, spitch, spitch, 40, cudaMemcpyHostToDevice);
我使用cuda-gdb来调试我的程序并找到问题,并在 cudaMemcpy2D()中保留seg faulting。 Backtrace提供以下输出,
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff501dd00 in cudbgGetAPIVersion () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
(cuda-gdb) backtrace
#0 0x00007ffff501dd00 in cudbgGetAPIVersion () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#1 0x00007ffff4efc68e in cuMemGetAttribute_v2 () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#2 0x00007ffff4f0cc7f in cuMemGetAttribute_v2 () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#3 0x00007ffff4efd7f1 in cuMemGetAttribute_v2 () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#4 0x00007ffff4e6b322 in cuMemGetAttribute_v2 () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#5 0x00007ffff4e74b38 in cuMemGetAttribute_v2 () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#6 0x00007ffff4e4d92a in cuMemcpy2DUnaligned_v2 () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#7 0x000000000045bc5d in cudart::driverHelper::memcpy2DPtr(char*, unsigned long, char const*, unsigned long, unsigned long, unsigned long, cudaMemcpyKind, CUstream_st*, bool, bool) ()
#8 0x0000000000435039 in cudart::cudaApiMemcpy2DCommon(void*, unsigned long, void const*, unsigned long, unsigned long, unsigned long, cudaMemcpyKind, bool) ()
#9 0x00000000004350f8 in cudart::cudaApiMemcpy2D(void*, unsigned long, void const*, unsigned long, unsigned long, unsigned long, cudaMemcpyKind) ()
#10 0x0000000000462073 in cudaMemcpy2D ()
在devtalk论坛上有关于音高限制的问题,其中 cudaMemcpy2D()失败,音高大于2 ^ 18但是这个问题来自2007年,我认为这个限制不再存在。另外在文档中提到如果dpitch或spitch超过允许的最大值 cudaMemcpy2D()会返回错误,但它们不会告诉最大允许值。
非常感谢任何帮助。
答案 0 :(得分:1)
您的代码正在尝试将类型为40 * size
的{{1}}字节数据复制到char
类型的40字节主机内存空间。
相反,你需要为主机上的所有40个模式malloc一个线性内存空间,如:
char*