Question

所有人都好，

我想使用PyCUDA生成MxN图像，其中大多数时间M不等于N。例如160x142图像。当我尝试执行代码时，我注意到偏移量中有很多缺失的序列。

我尝试如下更改网格和块的值

    if xw % BLOCK_SIZE != 0:
        grid_xw = xw//BLOCK_SIZE+1
    else:
        grid_xw = xw//BLOCK_SIZE

    if yw % BLOCK_SIZE != 0:
        grid_yw = yw//BLOCK_SIZE+1
    else:
        grid_yw = yw//BLOCK_SIZE

并在全局范围内添加约束，如下所示：

if ((x < xw) && (y < yw))
    {
        float _x = ((float)x * 0.5) + minimum_coordinate_x;
        float _y = ((float)y * 0.5) + minimum_coordinate_y;

        __syncthreads();
    }

但由于顺序仍然不正确，这似乎不是正确的方法。

以下是我的代码的简化版本。

import matplotlib.pyplot as plt
import pycuda.autoinit
import pycuda.driver as driver
from pycuda import gpuarray
from pycuda.compiler import SourceModule
import numpy as np

AREA_WIDTH = 60.0
grid_size = 0.5
BLOCK_SIZE = 32

ker = SourceModule("""
#include <stdio.h>

__global__ void image_ker(float *image, float minx, float miny, int xw, int yw)
{
    unsigned int x = threadIdx.x + blockIdx.x * blockDim.x;
    unsigned int y = threadIdx.y + blockIdx.y * blockDim.y;
    unsigned int offset = x + (y * blockDim.x * gridDim.x);
    float minimum_coordinate_x = minx;
    float minimum_coordinate_y = miny;

    __syncthreads();

        float _x = ((float)x * 0.5) + minimum_coordinate_x;
        float _y = ((float)y * 0.5) + minimum_coordinate_y;

        __syncthreads();

        printf("Thread x = %d, \t blockIdx x = %d, \t blockDim x = %d, \t thread y = %d, \t blockIdx y = %d, \t blockDim y = %d,%d, \t x = %d, \t _x = %.3f\\n", threadIdx.x, blockIdx.x, blockDim.x, threadIdx.y, blockIdx.y, blockDim.y, offset, x, _x);

    __syncthreads();
}
""")

if __name__ == '__main__':

    image_ker = ker.get_function("image_ker")

    minx = 5.0 - AREA_WIDTH / 2.0
    miny = 15.0 - AREA_WIDTH / 2.0
    maxx = 25.0 + AREA_WIDTH / 2.0
    maxy = 26.0 + AREA_WIDTH / 2.0


    xw = int(round((maxx - minx) / grid_size))
    yw = int(round((maxy - miny) / grid_size))
    image = np.array([[0.0 for i in range(yw)]
                     for i in range(xw)], dtype=np.float32)

    if xw % BLOCK_SIZE != 0:
        grid_xw = xw//BLOCK_SIZE+1
    else:
        grid_xw = xw//BLOCK_SIZE

    if yw % BLOCK_SIZE != 0:
        grid_yw = yw//BLOCK_SIZE+1
    else:
        grid_yw = yw//BLOCK_SIZE

    image_gpu = gpuarray.to_gpu(image)
    image_ker(image_gpu, np.float32(minx), np.float32(miny), np.int32(xw), np.int32(yw), block=(BLOCK_SIZE, BLOCK_SIZE, 1),
             grid=(grid_xw, grid_yw, 1))

我希望偏移量按顺序是总共22720（160x142），顺序从0开始，以22719结尾。但是，上面的代码中有很多丢失的数组，并且不是按顺序的（例如0-95，然后是160 ...）。目前，我怀疑问题是由于我的图像尺寸不相等（M！= N），并且在分配块和网格时出现了问题。

因此，我想知道还有另一种方法可以使我的补偿正确无误吗？还是我在代码中错过了什么？

非常感谢您。

pycuda内存偏移不按顺序

0 个答案: