我正在尝试使用OMP运行矩阵乘法程序。我在串行和并行版本中得到了不同的输出。我正在尝试使用3 * 3矩阵进行测试。
我的并行代码是:
#include <omp.h>
#include <stdio.h>
#include <stdlib.h>
#define NRA 3//62 /* number of rows in matrix A */
#define NCA 3//15 /* number of columns in matrix A */
#define NCB 3//7 /* number of columns in matrix B */
int main (int argc, char *argv[])
{
int tid, nthreads, i, j, k, chunk;
double a[NRA][NCA], /* matrix A to be multiplied */
b[NCA][NCB], /* matrix B to be multiplied */
c[NRA][NCB]; /* result matrix C */
chunk = 10; /* set loop iteration chunk size */
/*** Spawn a parallel region explicitly scoping all variables ***/
#pragma omp parallel shared(a,b,c,nthreads,chunk) private(tid,i,j,k)
{
tid = omp_get_thread_num();
if (tid == 0)
{
nthreads = omp_get_num_threads();
printf("Starting matrix multiple example with %d threads\n",nthreads);
printf("Initializing matrices...\n");
}
/*** Initialize matrices ***/
#pragma omp for schedule (static, chunk)
for (i=0; i<NRA; i++)
for (j=0; j<NCA; j++)
a[i][j]= i+j;
#pragma omp for schedule (static, chunk)
for (i=0; i<NCA; i++)
for (j=0; j<NCB; j++)
b[i][j]= i*j;
#pragma omp for schedule (static, chunk)
for (i=0; i<NRA; i++)
for (j=0; j<NCB; j++)
c[i][j]= 0;
/*** Do matrix multiply sharing iterations on outer loop ***/
/*** Display who does which iterations for demonstration purposes ***/
printf("Thread %d starting matrix multiply...\n",tid);
#pragma omp for schedule (static, chunk)
for (i=0; i<NRA; i++)
{
printf("Thread=%d did row=%d\n",tid,i);
for(j=0; j<NCB; j++)
for (k=0; k<NCA; k++)
c[i][j] += a[i][k] * b[k][j];
}
} /*** End of parallel region ***/
/*** Print results ***/
printf("******************************************************\n");
printf("Result Matrix:\n");
for (i=0; i<NRA; i++)
{
for (j=0; j<NCB; j++)
printf("%6.2f ", a[i][j]);
printf("\n");
}
printf("******************************************************\n");
printf("******************************************************\n");
printf("Result Matrix:\n");
for (i=0; i<NRA; i++)
{
for (j=0; j<NCB; j++)
printf("%6.2f ", b[i][j]);
printf("\n");
}
printf("******************************************************\n");
printf("******************************************************\n");
printf("Result Matrix:\n");
for (i=0; i<NRA; i++)
{
for (j=0; j<NCB; j++)
printf("%6.2f ", c[i][j]);
printf("\n");
}
printf("******************************************************\n");
printf ("Done.\n");
}
对于串行版本,我刚刚注释了以下行:
#pragma omp for schedule (static, chunk)
我的并行版本的输出是:
具有12个线程的起始矩阵多个示例初始化 矩阵...线程0开始矩阵相乘...线程8开始 矩阵乘法...线程6起始矩阵乘法...线程9 起始矩阵乘法...线程5起始矩阵乘法... 线程1起始矩阵乘以...线程4起始矩阵 乘以...线程7开始矩阵乘以...线程10开始 矩阵乘法...线程3起始矩阵乘法...线程2 起始矩阵乘法...线程= 0进行行= 0线程= 0进行行= 1 线程= 0行= 2线程11起始矩阵相乘... ****************************************************** ****结果矩阵: 0.00 1.00 2.00
1.00 2.00 3.00
2.00 3.00 4.00
*************************************************** *******结果矩阵: 0.00 0.00 0.00
0.00 1.00 2.00
0.00 2.00 4.00
*************************************************** *******结果矩阵: 0.00 5.00 10.00
0.00 8.00 16.00
0.00 11.00 22.00
****************************************************** ****完成。
我的串行版本的输出是这样的:
具有12个线程的起始矩阵多个示例初始化 矩阵...线程0开始矩阵相乘...线程3开始 矩阵乘法...线程5起始矩阵乘法...线程11 起始矩阵乘法...线程1起始矩阵乘法... 线程10起始矩阵乘以...线程2起始矩阵 乘以...线程9开始矩阵乘以...线程7开始 矩阵乘法...线程8起始矩阵乘法...线程4 起始矩阵乘法...线程6起始矩阵乘法... ****************************************************** ****结果矩阵: 0.00 1.00 2.00
1.00 2.00 3.00
2.00 3.00 4.00
*************************************************** *******结果矩阵: 0.00 0.00 0.00
0.00 1.00 2.00
0.00 2.00 4.00
*************************************************** *******结果矩阵: 0.00 60.00 120.00
0.00 96.00 192.00
0.00 132.00 264.00
****************************************************** ****完成。
我该如何解决这个问题?
答案 0 :(得分:0)
我发现了错误。在串行版本中,我没有正确评论。我忽略了这一行:
#pragma omp parallel shared(a,b,c,nthreads,chunk) private(tid,i,j,k)