Question

我有一个cpp文件，我正在创建一个图像并将数据存储到myOutput指针：

int Rows = 80;
int Cols = 64;

for (int i = 0; i < Rows; i++ ){

   for (int j = 0; j < Cols; j++ )
    {

X = 1.0f * ((float) i - (float) Rows / 2) / (float) Rows;
Y = 2.0f * ((float) j - (float) Cols / 2) / (float) Cols;
.....
myOutput->Re = cosf( ......);
myOutput->Im = sinf(.......);

++myOutput;

    }
}

然后，在cuda我读的是：

int bx = blockIdx.x , by = blockIdx.y;
int tx = threadIdx.x , ty = threadIdx.y;

int RowIdx = ty + by * TILE_WIDTH;
int ColIdx = tx + bx * TILE_WIDTH;


Index = RowIdx * Cols + ColIdx;

//copy input data to shared memory
myshared[ty+1][tx+1] = *( devInputArray + Index );

（因此，从cpp生成的myOutput在devInputArray中加载）。

现在，我想同时处理许多图像。

因此，在cpp中，必须进行以下添加（例如，对于2个图像）：

int ImagesNb = 2;

for ( ImagesIdx = 0; ImagesIdx < ImagesNb; ImagesIdx++ ){
   for (int i = 0; i < Rows; i++ ){

       for (int j = 0; j < Cols; j++ )
        {

 X = (ImagesIdx + 1) * 1.0f * ((float) i - (float) Rows / 2) / (float) Rows;
 Y = (ImagesIdx + 1) * 2.0f * ((float) j - (float) Cols / 2) / (float) Cols;
...

但是，现在我不知道如何阅读cuda中的数据。

我不知道如何考虑图像的数量。

之前，我有一个包含数据的指针（80 x 64）。

现在，它仍然包含每个图像的相同尺寸，但数据更多。

我必须改变这个：

Index = RowIdx * Cols + ColIdx;

//copy input data to shared memory
myshared[ty+1][tx+1] = *( devInputArray + Index );

但我无法理解！

我希望很清楚！

已更新

我正在尝试这样的事情：

 int bx = blockIdx.x , by = blockIdx.y ,  bz = blockIdx.z;
 int tx = threadIdx.x , ty = threadIdx.y , tz = threadIdx.z;

 int RowIdx = ty + by * TILE_WIDTH;
 int ColIdx = tx + bx * TILE_WIDTH;
 int ImagesIdx = tz + bz * blockDim.z;

 Index = RowIdx * Cols + ColIdx + Rows * Cols * ImagesIdx

和：

dim3 dimGrid( ImagesNb * (Cols / TILE_WIDTH)  , ImagesNb * (Rows / TILE_WIDTH) , ImagesNb);
dim3 dimBlock( TILE_WIDTH , TILE_WIDTH , 2);

但是，如果我尝试2张图片，我的效果不正确..

Answer 1

好的，要使用大量图像，您必须为共享变量添加额外的维度，以保存图像数量。

以适当的方式读取数据

1 个答案: