MPI_Wtime计时器在OpenMPI 2.0.2中的运行速度提高了约2倍

时间:2017-02-16 10:17:41

标签: mpi openmpi

将OpenMPI从1.8.4更新为2.0.2后,我使用MPI_Wtime()进行了错误的时间测量。对于版本1.8.4,结果与omp_get_wtime()计时器返回的结果相同,现在MPI_Wtime的运行速度提高了约2倍。

什么可能导致这种行为?

我的示例代码:

#include <omp.h>
#include <mpi.h>
#include <stdio.h>
#include <stdlib.h>

int some_work(int rank, int tid){
  int count = 10000;
  int arr[count];
  for( int i=0; i<count; i++)
    arr[i] = i + tid + rank;
  for( int val=0; val<4000000; val++)
    for(int i=0; i<count-1; i++)
      arr[i] = arr[i+1];

  return arr[0];
}


int main (int argc, char *argv[]) {

  MPI_Init(NULL, NULL);
  int rank, size;

  MPI_Comm_size(MPI_COMM_WORLD, &size);
  MPI_Comm_rank(MPI_COMM_WORLD, &rank);

  if (rank == 0)
    printf("there are %d mpi processes\n", size);

  MPI_Barrier(MPI_COMM_WORLD);

  double omp_time1 = omp_get_wtime();
  double mpi_time1 = MPI_Wtime();
  #pragma omp parallel 
  {
    int tid = omp_get_thread_num();
    if ( tid == 0 ) {
      int nthreads = omp_get_num_threads();
      printf("There are %d threads for process %d\n", nthreads, rank);
      int result = some_work(rank, tid);
      printf("result for process %d thread %d is %d\n", rank, tid, result);
    }
  }

  MPI_Barrier(MPI_COMM_WORLD);
  double mpi_time2 = MPI_Wtime();
  double omp_time2 = omp_get_wtime();
  printf("process %d omp time: %f\n", rank, omp_time2 - omp_time1);
  printf("process %d mpi time: %f\n", rank,  mpi_time2 - mpi_time1);
  printf("process %d ratio: %f\n", rank, (mpi_time2 - mpi_time1)/(omp_time2 - omp_time1) );

  MPI_Finalize();

  return EXIT_SUCCESS;
}

编译

g++ -O3 src/example_main.cpp -o bin/example -fopenmp -I/usr/mpi/gcc/openmpi-2.0.2/include -L /usr/mpi/gcc/openmpi-2.0.2/lib -lmpi

正在运行

salloc -N2 -n2 mpirun --map-by ppr:1:node:pe=16 bin/example 

提供类似

的内容
there are 2 mpi processes
There are 16 threads for process 0
There are 16 threads for process 1
result for process 1 thread 0 is 10000
result for process 0 thread 0 is 9999
process 1 omp time: 5.066794
process 1 mpi time: 10.098752
process 1 ratio: 1.993125
process 0 omp time: 5.066816
process 0 mpi time: 8.772390
process 0 ratio: 1.731342

这个比例与我先写的不一致,但仍然足够大。

OpenMPI 1.8.4的结果还可以:

g++ -O3 src/example_main.cpp -o bin/example -fopenmp -I/usr/mpi/gcc/openmpi-1.8.4/include -L /usr/mpi/gcc/openmpi-1.8.4/lib -lmpi -lmpi_cxx

给予

result for process 0 thread 0 is 9999
result for process 1 thread 0 is 10000
process 0 omp time: 4.655244
process 0 mpi time: 4.655232
process 0 ratio: 0.999997
process 1 omp time: 4.655335
process 1 mpi time: 4.655321
process 1 ratio: 0.999997

2 个答案:

答案 0 :(得分:2)

我的群集上有类似的行为(与您的OpenMPI版本相同,2.0.2),问题是CPU频率的默认调控器,即“保守”调控器。 一旦将调速器设置为“性能”,MPI_Wtime()的输出就与正确的时序(在我的情况下输出'时间')对齐。 看来,对于某些Xeon处理器,当使用过于激进的动态频率调整策略时,某些时钟功能会出现偏差 - 相同的OpenMPI版本不会在同一群集中的新Xeon上遇到此问题。

答案 1 :(得分:0)

也许MPI_Wtime()本身可能是一项代价高昂的操作? 如果你避免测量MPI_Wtime()作为OpenMP-Time的一部分所消耗的时间,结果会更加一致吗? E.g:

double mpi_time1 = MPI_Wtime();
double omp_time1 = omp_get_wtime();
/* do something */
double omp_time2 = omp_get_wtime();
double mpi_time2 = MPI_Wtime();