PyCUDA:cuMemFree失败

时间:2019-05-21 10:53:29

标签: python c cuda pycuda

我正在尝试在带有PyCUDA软件包的python3上使用Cuda处理。但是当我在内核中使用malloc函数时,我遇到了麻烦。 如果行数中的maximum_array小于10000,则一切正常。但是,当行数超过100000时,我看到错误cuMemFree失败。我该如何解决这个问题?

错误文字:

PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuModuleUnload failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuModuleUnload failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuModuleUnload failed: an illegal memory access was encountered

内核功能:

__global__ void coordinate_calculation (int *minimize_data, int size_minimize_data_x, int size_minimize_data_y)
{
   int t_idx = threadIdx.x;
   int t_idy = threadIdx.y;
   int b_idx = blockIdx.x;
   int b_idy = blockIdx.y;
   int b_dimx = blockDim.x;
   int b_dimy = blockDim.y;
   int g_dimx = gridDim.x;
   int block_index = b_idx+b_idy*g_dimx;
   int thread_index = t_idx+t_idy*b_dimx;
   int id_global = block_index*b_dimx*b_dimy+thread_index;

   int event_count = size_minimize_data_y;

   if (id_global>event_count-1)
   {
       return;
   }

   float *matrix=(float*)malloc(15*sizeof(float));
}

Python函数:

moments_count = minimize_data.shape[0]
block_x_size = 32
block_y_size = 32
grid_x_size = int(
    (moments_count / (block_x_size * block_y_size)) ** 0.5) + 1
grid_y_size = int(
    (moments_count / (block_x_size * block_y_size)) ** 0.5) + 1

minimize_data = minimize_data.astype(np.uint32)
minimize_data_gpu = cuda.mem_alloc(minimize_data.nbytes)
cuda.memcpy_htod(minimize_data_gpu, minimize_data)

core = get_core()
mod = SourceModule(core)
func = mod.get_function('coordinate_calculation')
func(minimize_data_gpu, np.uint32(minimize_data.shape[1]), 
     np.uint32(minimize_data.shape[0], 
     block=(block_x_size, block_y_size, 1), 
     grid=(grid_x_size, grid_y_size,1))

0 个答案:

没有答案