我正在尝试在带有PyCUDA软件包的python3上使用Cuda处理。但是当我在内核中使用malloc函数时,我遇到了麻烦。 如果行数中的maximum_array小于10000,则一切正常。但是,当行数超过100000时,我看到错误cuMemFree失败。我该如何解决这个问题?
错误文字:
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuModuleUnload failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuModuleUnload failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuModuleUnload failed: an illegal memory access was encountered
内核功能:
__global__ void coordinate_calculation (int *minimize_data, int size_minimize_data_x, int size_minimize_data_y)
{
int t_idx = threadIdx.x;
int t_idy = threadIdx.y;
int b_idx = blockIdx.x;
int b_idy = blockIdx.y;
int b_dimx = blockDim.x;
int b_dimy = blockDim.y;
int g_dimx = gridDim.x;
int block_index = b_idx+b_idy*g_dimx;
int thread_index = t_idx+t_idy*b_dimx;
int id_global = block_index*b_dimx*b_dimy+thread_index;
int event_count = size_minimize_data_y;
if (id_global>event_count-1)
{
return;
}
float *matrix=(float*)malloc(15*sizeof(float));
}
Python函数:
moments_count = minimize_data.shape[0]
block_x_size = 32
block_y_size = 32
grid_x_size = int(
(moments_count / (block_x_size * block_y_size)) ** 0.5) + 1
grid_y_size = int(
(moments_count / (block_x_size * block_y_size)) ** 0.5) + 1
minimize_data = minimize_data.astype(np.uint32)
minimize_data_gpu = cuda.mem_alloc(minimize_data.nbytes)
cuda.memcpy_htod(minimize_data_gpu, minimize_data)
core = get_core()
mod = SourceModule(core)
func = mod.get_function('coordinate_calculation')
func(minimize_data_gpu, np.uint32(minimize_data.shape[1]),
np.uint32(minimize_data.shape[0],
block=(block_x_size, block_y_size, 1),
grid=(grid_x_size, grid_y_size,1))