我有代码,该代码应类似于Matrix A + Matrix B = Matrix C
。从NVidia Toolkit文档中获取了代码。
尝试在安装了CUDA 10.1的Visual Studio 2017社区中运行此程序,这不会导致任何错误,但会导致错误的计算结果。我是否缺少某些库,或者还有其他内容?对不起这个愚蠢的问题,我才刚刚开始学习CUDA,并试图弄清基础知识是如何工作的。
实际内核代码(无法正常工作):
#include "cuda_runtime.h"
#include "device_launch_parameters.h"
#include "stdio.h"
#include "cuda.h"
const int N = 2;
__global__ void MatAdd(int A[N][N], int B[N][N], int C[N][N])
{
int i = blockIdx.x * blockDim.x + threadIdx.x;
int j = blockIdx.y * blockDim.y + threadIdx.y;
C[i][j] = A[i][j] + B[i][j];
}
int main()
{
int A[N][N] = { 1, 2, 3, 4 };
int B[N][N] = { 12, 21, 34, 43 };
int C[N][N];
dim3 threadsPerBlock(N, N);
dim3 numBlocks(N / threadsPerBlock.x, N / threadsPerBlock.y);
MatAdd <<<N, threadsPerBlock >>> (A, B, C);
printf_s("%d\t%d\n%d\t%d\n", C[0][0], C[0][1], C[1][0], C[1][1]);
}
正确的结果是:
13 23
37 47
实际上:
-858993460 -858993460
-858993460 -858993460