我想在我的MPI应用程序代码中的一个进程中启动OpenMP多线程区域。例如:
#include <iostream>
#include <omp.h>
#include <mpi.h>
#include <Eigen/Dense>
using std::cin;
using std::cout;
using std::endl;
using namespace Eigen;
int main(int argc, char ** argv)
{
int rank, num_process;
MatrixXd A = MatrixXd::Ones(8, 4);
MatrixXd B = MatrixXd::Zero(8, 4);
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &num_process);
MPI_Status status;
if (rank == 0)
{
int i, j, bnum = 2, brow = 4, thid;
#pragma omp parallel shared(A, B) private(i, j, brow, bnum, thid) num_threads(2)
for (i = 0; i < brow; i ++)
{
for (j = 0; j < 4; j ++)
{
thid = omp_get_thread_num();
//cout << "thid " << thid << endl;
B(thid * brow+i,j) = A(thid*brow+i, j);
}
}
cout << "IN rank 0" << endl;
cout << B << endl;
cout << "IN rank 0" << endl;
MPI_Send(B.data(), 32, MPI_DOUBLE, 1, 1, MPI_COMM_WORLD);
}
else
{
MPI_Recv(B.data(), 32, MPI_DOUBLE, 0, 1, MPI_COMM_WORLD, &status);
cout << "IN rank 1" << endl;
cout << B << endl;
cout << "IN rank 1" << endl;
}
MPI_Finalize();
return 0;
}
在我的示例代码中,我想启动2个线程将数据从矩阵A复制到矩阵B,而我的机器有4个核心。但是当运行程序时,矩阵B只有一半的数据。
$ mpirun -n 2 ./shareMem
IN rank 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
IN rank 0
IN rank 1
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
IN rank 1
$ mpirun -n 4 ./shareMem # it just hang on and doesn't exit
IN rank 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
IN rank 0
IN rank 1
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
IN rank 1
我预期的输出是
$ mpirun -n 2 ./shareMem # it just hang on and doesn't exit
IN rank 0
1 1 1 1
1 1 1 1
1 1 1 1
1 1 1 1
1 1 1 1
1 1 1 1
1 1 1 1
1 1 1 1
IN rank 0
IN rank 1
1 1 1 1
1 1 1 1
1 1 1 1
1 1 1 1
1 1 1 1
1 1 1 1
1 1 1 1
1 1 1 1
IN rank 1
如何修复它并在我的代码中运行2个线程?谢谢!
答案 0 :(得分:1)
编译器没有抓住并行词中的拼写错误。
(SELECT SUM(result_enum) AS Positive FROM result WHERE result_enum > 0)
UNION ALL
(SELECT SUM(result_enum) AS Negative FROM result WHERE result_enum < 0)
UNION ALL
(SELECT SUM(result_enum) AS Neutral FROM result WHERE result_enum = 0)
PS:我没有足够的声誉来添加评论
答案 1 :(得分:1)
更改
#pragma omp parallel shared(A, B) private(i, j, brow, bnum, thid) num_threads(2)
到
#pragma omp parallel shared(A, B) private(i, j, thid) num_threads(2)
brow
,bnum
是共享变量。
通过向private子句添加名称bnum
和brow
,您将为每个线程创建具有此类名称的新自动变量,并且默认情况下它们是未定义的。