大家好,
此问题与pycuda memory offset not in sequence有关。回顾一下,我想使用PyCUDA生成MxN图像,其中大多数时间M不等于N。例如160x142图像。但是,这次我的预期输出与实际输出不一致。
我试图将输出image[offset] = 10.0
的值设置为常量,并且可以按预期工作。但是,当我尝试按image[offset] = image_x[x]
来引用值时,问题就出现了。
下面是我使用的代码。
import matplotlib.pyplot as plt
import pycuda.autoinit
import pycuda.driver as driver
from pycuda import gpuarray
from pycuda.compiler import SourceModule
import numpy as np
AREA_WIDTH = 60.0
grid_size = 0.5
BLOCK_SIZE = 32
ker = SourceModule("""
__global__ void image_ker(float *image, float *image_x, float *image_y)
{
unsigned int x = threadIdx.x + blockIdx.x * blockDim.x;
unsigned int y = threadIdx.y + blockIdx.y * blockDim.y;
unsigned int offset = x + (y * blockDim.x * gridDim.x);
float x_value = image_x[x];
__syncthreads();
if ((x < 160) && (y < 142))
{
image[offset] = x_value;
image_x[x] = x_value;
}
__syncthreads();
}
""")
if __name__ == '__main__':
image_ker = ker.get_function("image_ker")
minx = 5.0 - AREA_WIDTH / 2.0
miny = 15.0 - AREA_WIDTH / 2.0
maxx = 25.0 + AREA_WIDTH / 2.0
maxy = 26.0 + AREA_WIDTH / 2.0
xw = int(round((maxx - minx) / grid_size))
yw = int(round((maxy - miny) / grid_size))
image = np.array([[0.0 for i in range(yw)]
for i in range(xw)], dtype=np.float32)
print (minx, miny, maxx, maxy, xw, yw)
image_x = np.array([(np.float32(i)*grid_size + minx) for i in range(xw)], dtype = np.float32)
image_y = np.array([(np.float32(i)*grid_size + miny) for i in range(yw)], dtype = np.float32)
image_gpu = gpuarray.to_gpu(image)
image_x_gpu = gpuarray.to_gpu(image_x)
image_y_gpu = gpuarray.to_gpu(image_y)
image_ker(image_gpu, image_x_gpu, image_y_gpu, block=(32, 32, 1),
grid=(5, 5, 1))
image = image_gpu.get()
image_x = image_x_gpu.get()
image_y = image_y_gpu.get()
# print(grid_xw, grid_yw)
for ix in range(xw):
for jy in range(yw):
print("x, {}, image[{}][{}], {}".format(image_x[ix], ix, jy, image[ix][jy]))
我希望输出为
x, -25.0, image[0][0], -25.0
x, -25.0, image[0][1], -25.0
x, -25.0, image[0][2], -25.0
x, -25.0, image[0][3], -25.0
x, -25.0, image[0][4], -25.0
x, -25.0, image[0][5], -25.0
...
x, -4.0, image[42][77], -4.0
x, -4.0, image[42][78], -4.0
x, -4.0, image[42][79], -4.0
x, -4.0, image[42][80], -4.0
x, -4.0, image[42][81], -4.0
...
x, 54.5, image[159][138], 54.5
x, 54.5, image[159][139], 54.5
x, 54.5, image[159][140], 54.5
x, 54.5, image[159][141], 54.5
但是,我的输出是
x, -25.0, image[0][0], -25.0
x, -25.0, image[0][1], -24.5
x, -25.0, image[0][2], -24.0
x, -25.0, image[0][3], -23.5
x, -25.0, image[0][4], -23.0
x, -25.0, image[0][5], -22.5
...
x, -4.0, image[42][77], 35.5
x, -4.0, image[42][78], 36.0
x, -4.0, image[42][79], 36.5
x, -4.0, image[42][80], 37.0
x, -4.0, image[42][81], 37.5
...
x, 54.5, image[159][138], 53.0
x, 54.5, image[159][139], 53.5
x, 54.5, image[159][140], 54.0
x, 54.5, image[159][141], 54.5
在__global__
函数中
image[offset] = x_value;
image_x[x] = x_value;
image_x[x]
返回正确的值,但是image[offset]
返回某种减少的结果。
我的问题是,是否可以以某种方式返回正确的结果?还是在将image [x]引用到image [offset]时遗漏了什么?