Question

我的尺寸为512 * 512 * 512的3D图像。我必须单独处理所有体素。但是，我无法获得正确的尺寸来获取x，y和z值以获得像素。

在我的内核中我有：

int x = blockIdx.x * blockDim.x + threadIdx.x;
int y = blockIdx.y * blockDim.y + threadIdx.y;
int z = blockIdx.z * blockDim.z + threadIdx.z;

我正在运行程序：

Kernel<<<dim3(8,8), dim3(8,8,16)>>>();

我选择那些因为每个1024个线程有64个块应该给我每个像素。但是，当我有这些尺寸时，如何获得坐标值...

当调用内核函数时，我必须设置一些尺寸，x，y和z值实际上从0到511.（这给了我每个像素的位置）。但是我尝试的每一个组合，我的内核要么不运行，要么运行但是值不够高。

程序应该使每个内核都能获得一个像素（x，y，z），该像素对应于图像中的那个像素。以最简单的方式，我只是打印坐标，看它是否打印出所有坐标。

任何帮助？

修改

我的GPU属性：

Compute capability: 2.0
Name: GeForce GTX 480

我的程序代码只是为了测试它：

#include <stdio.h>
#include <cuda.h>
#include <stdlib.h>

// Device code
__global__ void Kernel()
{
    // Here I should somehow get the x, y and z values for every pixel possible in the 512*512*512 image
    int x = blockIdx.x * blockDim.x + threadIdx.x;
    int y = blockIdx.y * blockDim.y + threadIdx.y;
    int z = blockIdx.z * blockDim.z + threadIdx.z;

    printf("Coords: (%i, %i, %i)\n", x, y, z);
}

// Host code
int main(int argc, char** argv) {

    Kernel<<<dim3(8, 8), dim3(8,8,16)>>>(); //This invokes the kernel
    cudaDeviceSynchronize();

    return 0;
}

Answer 1

要使用您显示的索引覆盖512x512x512空间（即每个体素一个线程），您需要内核启动，如下所示：

Kernel<<<dim3(64,64,64), dim3(8,8,8)>>>();

当我乘以任何维度组件时：

64*8

我得到512.这给了我一个包含3个维度的512个线程的网格。您的索引将按原样使用，以便为每个体素生成一个唯一的线程。

以上假定使用cc2.0或更高版本的设备（每块提及1024个线程表示您拥有cc2.0 +设备），permits 3D grids。如果您有cc1.x设备，则需要修改索引。

在这种情况下，你可能想要这样的东西：

int x = blockIdx.x * blockDim.x + threadIdx.x;
int y = (blockIdx.y%64) * blockDim.y + threadIdx.y;
int z = (blockIdx.y/64) * blockDim.z + threadIdx.z;

以及像这样的内核启动：

Kernel<<<dim3(64,4096), dim3(8,8,8)>>>();

这是一个完整的示例（cc2.0），基于您现在显示的代码：

$ cat t604.cu
#include <stdio.h>

#define cudaCheckErrors(msg) \
    do { \
        cudaError_t __err = cudaGetLastError(); \
        if (__err != cudaSuccess) { \
            fprintf(stderr, "Fatal error: %s (%s at %s:%d)\n", \
                msg, cudaGetErrorString(__err), \
                __FILE__, __LINE__); \
            fprintf(stderr, "*** FAILED - ABORTING\n"); \
            exit(1); \
        } \
    } while (0)

// Device code
__global__ void Kernel()
{
    // Here I should somehow get the x, y and z values for every pixel possible in the 512*512*512 image
    int x = blockIdx.x * blockDim.x + threadIdx.x;
    int y = blockIdx.y * blockDim.y + threadIdx.y;
    int z = blockIdx.z * blockDim.z + threadIdx.z;

    if ((x==511)&&(y==511)&&(z==511)) printf("Coords: (%i, %i, %i)\n", x, y, z);
}

// Host code
int main(int argc, char** argv) {
    cudaFree(0);
    cudaCheckErrors("CUDA is not working correctly");
    Kernel<<<dim3(64, 64, 64), dim3(8,8,8)>>>(); //This invokes the kernel
    cudaDeviceSynchronize();
    cudaCheckErrors("kernel fail");

    return 0;
}
$ nvcc -arch=sm_20 -o t604 t604.cu
$ cuda-memcheck ./t604
========= CUDA-MEMCHECK
Coords: (511, 511, 511)
========= ERROR SUMMARY: 0 errors
$

请注意，我选择只打印一行。我不想涉及512x512x512行的打印输出，需要非常长时间运行，in-kernel printf is limited in output volume anyway。

CUDA 3D图像的坐标

1 个答案: