我遇到的问题类似于所讨论的问题in this topic,我有一个MPI代码,它对具有特定行数的向量的行求和。我附上the code here。
当我尝试使用一个核心在线编译mpirun -n 1 ./program
时,我获得了:
500000 sum 125000250000.00000 calculated by root process.
The grand total is: 125000250000.00000
因为我只有一个计算总和的核心,所以看起来还不错。但是当我尝试使用多核mpirun -n 4 ./program
时,我获得了:
please enter the number of numbers to sum:
500000
[federico-C660:9540] *** An error occurred in MPI_Recv
[federico-C660:9540] *** on communicator MPI_COMM_WORLD
[federico-C660:9540] *** MPI_ERR_TRUNCATE: message truncated
[federico-C660:9540] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
sum 7812562500.0000000 calculated by root process.
--------------------------------------------------------------------------
mpirun has exited due to process rank 1 with PID 9539 on
node XXXXX1 exiting without calling "finalize".
我也为C程序here提出了类似的问题。与2和3处理器相同。
有人可以帮我弄清问题是什么?我的猜测是我在与#34;发送者"相关的MPI_RECV呼叫中犯了一个错误。
答案 0 :(得分:3)
代码中存在一些问题;
CALL mpi_recv (num_rows_to_receive, 1 , mpi_integer,
root_process, mpi_any_tag, mpi_comm_world,
STATUS, ierr)
CALL mpi_recv (vector2, num_rows_to_receive, mpi_real8, root_process,
mpi_any_tag, mpi_comm_world, STATUS, ierr)
这应解决错误。
~/temp$ mpirun -n 8 ./a.out
please enter the number of numbers to sum:
500000
sum 1953156250.0000000 calculated by root process.
partial sum 5859406250.0000000 returned from process 1
partial sum 9765656250.0000000 returned from process 2
partial sum 17578156250.000000 returned from process 4
partial sum 21484406250.000000 returned from process 5
partial sum 13671906250.000000 returned from process 3
partial sum 25390656250.000000 returned from process 6
partial sum 29296906250.000000 returned from process 7
The grand total is: 125000250000.00000