Question

documentation of cudaMalloc3D说

返回的cudaPitchedPtr包含其他字段xsize和 ysize，分配的逻辑宽度和高度，它们是相当于由。提供的宽度和高度范围参数程序员在分配期间。

但是，如果我运行以下最小示例

#include<stdio.h>
#include<cuda.h>
#include<cuda_runtime.h>
#include<device_launch_parameters.h>
#include<conio.h>

#define Nrows 64
#define Ncols 64
#define Nslices 16

/********************/
/* CUDA ERROR CHECK */
/********************/
// --- Credit to http://stackoverflow.com/questions/14038589/what-is-the-canonical-way-to-check-for-errors-using-the-cuda-runtime-api
void gpuAssert(cudaError_t code, char *file, int line, bool abort = true)
{
    if (code != cudaSuccess)
    {
        fprintf(stderr, "GPUassert: %s %s %d\n", cudaGetErrorString(code), file, line);
        if (abort) { exit(code); }
    }
}

void gpuErrchk(cudaError_t ans) { gpuAssert((ans), __FILE__, __LINE__); }

/********/
/* MAIN */
/********/
int main() {

    // --- 3D pitched allocation and host->device memcopy
    cudaExtent extent = make_cudaExtent(Ncols * sizeof(float), Nrows, Nslices);
    cudaPitchedPtr devPitchedPtr;
    gpuErrchk(cudaMalloc3D(&devPitchedPtr, extent));

    printf("xsize = %i; xsize in bytes = %i; ysize = %i\n", devPitchedPtr.xsize, devPitchedPtr.pitch, devPitchedPtr.ysize);

    return 0;
}

我收到：

xsize = 256; xsize in bytes = 512; ysize = 64

因此，ysize实际上等于Nrows，但xsize与Ncols或xsize in bytes / sizeof(float)不同。

您能帮我理解xsize ysize中cudaPitchedPtr和cudaMalloc3D字段的含义吗？

非常感谢您提前寻求帮助。

我的系统：Windows 10，CUDA 8.0，GT 920M，cc 3.5。

Answer 1

xsize = Ncols * sizeof(float)

xsize是分配的逻辑宽度（以字节为单位），而不是 pitched 宽度

逻辑宽度= 256字节

节距宽度= 512字节

它与您在分配期间提供的宽度参数（即传递给make_cudaExtent的第一个参数）等效（相同）

Answer 2

与此问题非常相关且有效的示例（@JackOLant在另一篇文章中由您自己回答）是here，该示例显示了如何使用cudaMalloc3D等。

我已经掌握了一条经验法则，可以以某种方式回答这个问题，并希望与您分享：“在CUDA库的上下文中，除非我们与cudaArrays一起使用，否则width意味着nCols * sizeof(datatype)以字节为单位，pitch表示以字节为单位的width + 0或width + some padding（取决于阵列和GPU硬件的大小）。”

PS 。使用CUDA数组时，我们根据行（width）中元素的数量（而不是字节数）来定义nCols。这是因为CUDA阵列负责内部存储器的布局，而我们不需要提供字节数的width。

cudaMalloc3D的输出音调指针中xsize和ysize字段的含义

2 个答案: