您好我想创建一个带有随机值的2dim数组(N *(N + 1))(在主机上),将其复制到设备并打印两者,主机阵列和设备阵列,看看我是否做了它正确。
问题是虽然主机阵列打印得非常好,而设备阵列却错过了很多值并且有点混乱。我认为处理指针的方式有问题,但我不知道是什么。
以下代码创建矩阵并将其复制到设备:
#define DEC_COUNT (1000)
void create_matrix(cuda_matrix *matrix, int var_cnt, bool clear)
{
cudaError error;
double **h_matrix = (double **)malloc(sizeof(double *) * var_cnt);
assert(h_matrix != NULL);
if (clear) {
for (int y = 0; y < var_cnt; y++) {
h_matrix[y] = (double *)calloc((var_cnt+1), sizeof(double));
assert(h_matrix[y] != NULL);
}
} else {
for (int y = 0; y < var_cnt; y++) {
h_matrix[y] = (double *)malloc(sizeof(double) * (var_cnt+1));
assert(h_matrix[y] != NULL);
for (int i = 0; i < var_cnt+1; ++i){
srand(time(NULL)*(i+1)*(y+1));
h_matrix[y][i] = ((double)rand()/(double)RAND_MAX)*DEC_COUNT;
}
}
}
printf("h_matrix:\n");
print_matrix(h_matrix, var_cnt);
error = cudaMallocPitch(&(matrix->d_matrix), &(matrix->pitch),
sizeof(double)*(var_cnt+1), var_cnt);
checkCudaErrors(error);
error = cudaMemcpy2D(matrix->d_matrix, matrix->pitch, h_matrix,
sizeof(double)*(var_cnt+1), sizeof(double)*(var_cnt+1), var_cnt, cudaMemcpyHostToDevice);
checkCudaErrors(error);
printf("d_matrix\n");
print_matrix<<<1,1>>>(matrix->d_matrix, matrix->var_count, matrix->pitch);
checkCudaErrors(cudaDeviceSynchronize());
free_matrix(h_matrix, var_cnt);
}
Cuda打印功能:
__global__ void print_matrix(double *d_matrix, int height, size_t pitch)
{
//assert(matrix != NULL);
/*double *d_matrix = matrix->d_matrix;
int height = matrix->var_count;
size_t pitch = matrix->pitch;*/
for (int j = 0; j < height; j++) {
// image row
double *row = (double*)((char*)d_matrix + j * pitch);
for (int i = 0; i < height+1; i++){
if (i == height)
printf("|%.1f", (row[i] == -0.0)? 0.0 : row[i]);
else
printf("%.1f ", (row[i] == -0.0)? 0.0 : row[i]);
}
printf("\n");
}
printf("\n");
}
运行程序后我得到了这个输出:h_matrix和d_matrix应该是一样的!
h_matrix:
80.4 465.7 568.3 663.8 554.6 650.4 748.9 642.3 |738.4
465.7 663.8 650.4 642.3 333.0 821.3 309.0 299.4 |495.5
568.3 650.4 738.4 821.3 407.5 495.5 584.6 168.1 |761.1
663.8 642.3 821.3 299.4 487.1 168.1 141.5 829.1 |513.6
554.6 333.0 407.5 487.1 559.5 843.8 414.8 490.9 |566.1
650.4 821.3 495.5 168.1 843.8 513.6 187.8 150.8 |322.8
748.9 309.0 584.6 141.5 414.8 187.8 249.8 523.5 |85.2
642.3 299.4 168.1 829.1 490.9 150.8 523.5 180.4 |344.3
d_matrix
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 |0.0
0.0 80.4 465.7 568.3 663.8 554.6 650.4 748.9 |642.3
738.4 0.0 465.7 663.8 650.4 642.3 333.0 821.3 |309.0
299.4 495.5 0.0 568.3 650.4 738.4 821.3 407.5 |495.5
584.6 168.1 761.1 0.0 663.8 642.3 821.3 299.4 |487.1
168.1 141.5 829.1 513.6 0.0 554.6 333.0 407.5 |487.1
559.5 843.8 414.8 490.9 566.1 0.0 650.4 821.3 |495.5
168.1 843.8 513.6 187.8 150.8 322.8 0.0 748.9 |309.0
我希望你能帮我解决这个问题。我对cuda很新,这实际上是我的第一个cuda程序