在CUDA设备上复制和打印2dim数组的正确方法

时间:2016-12-19 09:49:53

标签: c++ cuda

您好我想创建一个带有随机值的2dim数组(N *(N + 1))(在主机上),将其复制到设备并打印两者,主机阵列和设备阵列,看看我是否做了它正确。

问题是虽然主机阵列打印得非常好,而设备阵列却错过了很多值并且有点混乱。我认为处理指针的方式有问题,但我不知道是什么。

以下代码创建矩阵并将其复制到设备:

#define DEC_COUNT (1000)
void create_matrix(cuda_matrix *matrix, int var_cnt, bool clear)
{
    cudaError error;
    double **h_matrix = (double **)malloc(sizeof(double *) * var_cnt);
    assert(h_matrix != NULL);


    if (clear) {
        for (int y = 0; y < var_cnt; y++) {
            h_matrix[y] = (double *)calloc((var_cnt+1), sizeof(double));
            assert(h_matrix[y] != NULL);
        }
    } else {
        for (int y = 0; y < var_cnt; y++) {
            h_matrix[y] = (double *)malloc(sizeof(double) * (var_cnt+1));
            assert(h_matrix[y] != NULL);
            for (int i = 0; i < var_cnt+1; ++i){
                srand(time(NULL)*(i+1)*(y+1));
                h_matrix[y][i] = ((double)rand()/(double)RAND_MAX)*DEC_COUNT;
            }
        }
    }

    printf("h_matrix:\n");
    print_matrix(h_matrix, var_cnt);

    error = cudaMallocPitch(&(matrix->d_matrix), &(matrix->pitch),
            sizeof(double)*(var_cnt+1), var_cnt);
    checkCudaErrors(error);

    error = cudaMemcpy2D(matrix->d_matrix, matrix->pitch, h_matrix,
            sizeof(double)*(var_cnt+1), sizeof(double)*(var_cnt+1), var_cnt, cudaMemcpyHostToDevice);
    checkCudaErrors(error);

    printf("d_matrix\n");

    print_matrix<<<1,1>>>(matrix->d_matrix, matrix->var_count, matrix->pitch);
    checkCudaErrors(cudaDeviceSynchronize());

    free_matrix(h_matrix, var_cnt);
}

Cuda打印功能:

__global__ void print_matrix(double *d_matrix, int height, size_t pitch)
{
    //assert(matrix != NULL);
    /*double *d_matrix = matrix->d_matrix;
    int height = matrix->var_count;
    size_t pitch = matrix->pitch;*/
    for (int j = 0; j < height; j++) {
        // image row
        double *row = (double*)((char*)d_matrix + j * pitch);
        for (int i = 0; i < height+1; i++){
            if (i == height)
                printf("|%.1f", (row[i] == -0.0)? 0.0 : row[i]);
            else
                printf("%.1f ", (row[i] == -0.0)? 0.0 : row[i]);
        }
        printf("\n");
    }
    printf("\n");
}

运行程序后我得到了这个输出:h_matrix和d_matrix应该是一样的!

h_matrix:
80.4 465.7 568.3 663.8 554.6 650.4 748.9 642.3 |738.4
465.7 663.8 650.4 642.3 333.0 821.3 309.0 299.4 |495.5
568.3 650.4 738.4 821.3 407.5 495.5 584.6 168.1 |761.1
663.8 642.3 821.3 299.4 487.1 168.1 141.5 829.1 |513.6
554.6 333.0 407.5 487.1 559.5 843.8 414.8 490.9 |566.1
650.4 821.3 495.5 168.1 843.8 513.6 187.8 150.8 |322.8
748.9 309.0 584.6 141.5 414.8 187.8 249.8 523.5 |85.2
642.3 299.4 168.1 829.1 490.9 150.8 523.5 180.4 |344.3

d_matrix
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 |0.0
0.0 80.4 465.7 568.3 663.8 554.6 650.4 748.9 |642.3
738.4 0.0 465.7 663.8 650.4 642.3 333.0 821.3 |309.0
299.4 495.5 0.0 568.3 650.4 738.4 821.3 407.5 |495.5
584.6 168.1 761.1 0.0 663.8 642.3 821.3 299.4 |487.1
168.1 141.5 829.1 513.6 0.0 554.6 333.0 407.5 |487.1
559.5 843.8 414.8 490.9 566.1 0.0 650.4 821.3 |495.5
168.1 843.8 513.6 187.8 150.8 322.8 0.0 748.9 |309.0

我希望你能帮我解决这个问题。我对cuda很新,这实际上是我的第一个cuda程序

0 个答案:

没有答案