C ++和MPI如何将部分代码写成并行?

时间:2012-07-13 18:50:59

标签: c++ parallel-processing mpi



int main(int argc, char** argv)
    time_t rawtime;
    time ( &rawtime );
    string sta = ctime (&rawtime);
    cout << "Solving began..." << endl;

PetscInitialize(&argc, &argv, 0, 0);

  Mat            A;        /* linear system matrix */
  PetscInt       i,j,Ii,J,Istart,Iend,m = 120000,n = 3,its;
  PetscErrorCode ierr;
  PetscBool      flg = PETSC_FALSE;
  PetscScalar    v;
#if defined(PETSC_USE_LOG)
  PetscLogStage  stage;

  /* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
         Compute the matrix and right-hand-side vector that define
         the linear system, Ax = b.
     - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - */
     Create parallel matrix, specifying only its global dimensions.
     When using MatCreate(), the matrix format can be specified at
     runtime. Also, the parallel partitioning of the matrix is
     determined by PETSc at runtime.

     Performance tuning note:  For problems of substantial size,
     preallocation of matrix memory is crucial for attaining good 
     performance. See the matrix chapter of the users manual for details.
  ierr = MatCreate(PETSC_COMM_WORLD,&A);CHKERRQ(ierr);
  ierr = MatSetSizes(A,PETSC_DECIDE,PETSC_DECIDE,m,n);CHKERRQ(ierr);
  ierr = MatSetFromOptions(A);CHKERRQ(ierr);
  ierr = MatMPIAIJSetPreallocation(A,5,PETSC_NULL,5,PETSC_NULL);CHKERRQ(ierr);
  ierr = MatSeqAIJSetPreallocation(A,5,PETSC_NULL);CHKERRQ(ierr);
  ierr = MatSetUp(A);CHKERRQ(ierr);

     Currently, all PETSc parallel matrix formats are partitioned by
     contiguous chunks of rows across the processors.  Determine which
     rows of the matrix are locally owned. 
  ierr = MatGetOwnershipRange(A,&Istart,&Iend);CHKERRQ(ierr);

     Set matrix elements for the 2-D, five-point stencil in parallel.
      - Each processor needs to insert only elements that it owns
        locally (but any non-local elements will be sent to the
        appropriate processor during matrix assembly). 
      - Always specify global rows and columns of matrix entries.

     Note: this uses the less common natural ordering that orders first
     all the unknowns for x = h then for x = 2h etc; Hence you see J = Ii +- n
     instead of J = I +- m as you might expect. The more standard ordering
     would first do all variables for y = h, then y = 2h etc.

PetscMPIInt    rank;        // processor rank
PetscMPIInt    size;        // size of communicator

cout << "Rank = " << rank << endl;
cout << "Size = " << size << endl;

cout << "Generating 2D-Array" << endl;

double temp2D[120000][3];
 for (Ii=Istart; Ii<Iend; Ii++) { 
    for(J=0; J<n;J++){
      temp2D[Ii][J] = 1;
  cout << "Processor " << rank << " set values : " << Istart << " - " << Iend << " into 2D-Array" << endl;

  v = -1.0;
  for (Ii=Istart; Ii<Iend; Ii++) { 
    for(J=0; J<n;J++){
  cout << "Ii = " << Ii << " processor " << rank << " and it owns: " << Istart << " - " << Iend << endl;

     Assemble matrix, using the 2-step process:
       MatAssemblyBegin(), MatAssemblyEnd()
     Computations can be done while messages are in transition
     by placing code between these two statements.
  ierr = MatAssemblyBegin(A,MAT_FINAL_ASSEMBLY);CHKERRQ(ierr);
  ierr = MatAssemblyEnd(A,MAT_FINAL_ASSEMBLY);CHKERRQ(ierr);

cout << "No more MPI" << endl;
return 0;



我可以运行我的测试程序作为mpiexec -n 4测试并且它成功运行但由于某种原因我必须运行我的真实程序为mpiexec -n 4 ./myprog


Solving began...
Solving began...
Solving began...
Solving began...
Rank = 0
Size = 4
Generating 2D-Array
Processor 0 set values : 0 - 30000 into 2D-Array
Rank = 2
Size = 4
Generating 2D-Array
Processor 2 set values : 60000 - 90000 into 2D-Array
Rank = 3
Size = 4
Generating 2D-Array
Processor 3 set values : 90000 - 120000 into 2D-Array
Rank = 1
Size = 4
Generating 2D-Array
Processor 1 set values : 30000 - 60000 into 2D-Array
Ii = 30000 processor 0 and it owns: 0 - 30000
Ii = 90000 processor 2 and it owns: 60000 - 90000
Ii = 120000 processor 3 and it owns: 90000 - 120000
Ii = 60000 processor 1 and it owns: 30000 - 60000
no more MPI
no more MPI
no more MPI
no more MPI

两条评论后编辑: 所以我的目标是在具有20个节点且每个节点有2个核心的小型集群上运行它。后来应该在超级计算机上运行所以mpi绝对是我需要的方式。我目前正在两台不同的机器上进行测试,其中一台机器有1个处理器/ 4个核心,第二台机器有4个处理器/ 16个核心。

2 个答案:

答案 0 :(得分:5)

MPI是SPMD / MPMD模型的实现(单个程序多个数据/多个程序多个数据)。 MPI作业包括同时运行进程,这些进程在彼此之间交换消息,以便合作解决问题。您不能并行运行部分代码。您只能让部分代码不能相互通信但仍然可以并发执行。您应该使用mpirunmpiexec以并行模式启动您的应用程序。

如果您只想使代码的一部分并行,并且可以忍受只能在一台机器上运行代码的限制,那么您需要的是OpenMP而不是MPI。或者您也可以根据PETSc网站使用低级POSIX线程编程,它支持pthreads。 OpenMP建立在pthreads之上,因此可以使用PETSc和OpenMP。

答案 1 :(得分:1)


当您启动x MPI进程时,您将获得运行相同精确程序的x个副本。你需要像

if (rank == 0)
    do something
    do something else

让不同的流程做不同的事情。这些进程可以通过发送消息相互通信,但它们都运行相同的二进制文件。 如果您没有代码分歧,那么您只需获得相同程序的x个副本就可以得到相同的结果x次。