所有人都好,
我想使用PyCUDA生成MxN图像,其中大多数时间M不等于N。例如160x142
图像。当我尝试执行代码时,我注意到偏移量中有很多缺失的序列。
我尝试如下更改网格和块的值
if xw % BLOCK_SIZE != 0:
grid_xw = xw//BLOCK_SIZE+1
else:
grid_xw = xw//BLOCK_SIZE
if yw % BLOCK_SIZE != 0:
grid_yw = yw//BLOCK_SIZE+1
else:
grid_yw = yw//BLOCK_SIZE
并在全局范围内添加约束,如下所示:
if ((x < xw) && (y < yw))
{
float _x = ((float)x * 0.5) + minimum_coordinate_x;
float _y = ((float)y * 0.5) + minimum_coordinate_y;
__syncthreads();
}
但由于顺序仍然不正确,这似乎不是正确的方法。
以下是我的代码的简化版本。
import matplotlib.pyplot as plt
import pycuda.autoinit
import pycuda.driver as driver
from pycuda import gpuarray
from pycuda.compiler import SourceModule
import numpy as np
AREA_WIDTH = 60.0
grid_size = 0.5
BLOCK_SIZE = 32
ker = SourceModule("""
#include <stdio.h>
__global__ void image_ker(float *image, float minx, float miny, int xw, int yw)
{
unsigned int x = threadIdx.x + blockIdx.x * blockDim.x;
unsigned int y = threadIdx.y + blockIdx.y * blockDim.y;
unsigned int offset = x + (y * blockDim.x * gridDim.x);
float minimum_coordinate_x = minx;
float minimum_coordinate_y = miny;
__syncthreads();
float _x = ((float)x * 0.5) + minimum_coordinate_x;
float _y = ((float)y * 0.5) + minimum_coordinate_y;
__syncthreads();
printf("Thread x = %d, \t blockIdx x = %d, \t blockDim x = %d, \t thread y = %d, \t blockIdx y = %d, \t blockDim y = %d,%d, \t x = %d, \t _x = %.3f\\n", threadIdx.x, blockIdx.x, blockDim.x, threadIdx.y, blockIdx.y, blockDim.y, offset, x, _x);
__syncthreads();
}
""")
if __name__ == '__main__':
image_ker = ker.get_function("image_ker")
minx = 5.0 - AREA_WIDTH / 2.0
miny = 15.0 - AREA_WIDTH / 2.0
maxx = 25.0 + AREA_WIDTH / 2.0
maxy = 26.0 + AREA_WIDTH / 2.0
xw = int(round((maxx - minx) / grid_size))
yw = int(round((maxy - miny) / grid_size))
image = np.array([[0.0 for i in range(yw)]
for i in range(xw)], dtype=np.float32)
if xw % BLOCK_SIZE != 0:
grid_xw = xw//BLOCK_SIZE+1
else:
grid_xw = xw//BLOCK_SIZE
if yw % BLOCK_SIZE != 0:
grid_yw = yw//BLOCK_SIZE+1
else:
grid_yw = yw//BLOCK_SIZE
image_gpu = gpuarray.to_gpu(image)
image_ker(image_gpu, np.float32(minx), np.float32(miny), np.int32(xw), np.int32(yw), block=(BLOCK_SIZE, BLOCK_SIZE, 1),
grid=(grid_xw, grid_yw, 1))
我希望偏移量按顺序是总共22720(160x142),顺序从0开始,以22719结尾。但是,上面的代码中有很多丢失的数组,并且不是按顺序的(例如0-95,然后是160 ...)。目前,我怀疑问题是由于我的图像尺寸不相等(M!= N),并且在分配块和网格时出现了问题。
因此,我想知道还有另一种方法可以使我的补偿正确无误吗?还是我在代码中错过了什么?
非常感谢您。