Question

我遇到了以下问题。我做了一些计算并测量了它的执行时间。我的代码如下：

void do_some_work()
{
  ...
}

int main()
{
  double tSt;
  double tFn;

//first do_work block
  tSt = omp_get_wtime();
  do_some_work();
  tFn = omp_get_wtime();
  cout << "1st work done for " << (tFn-tSt) << " s." << endl;

//second do_work block
  int cpus = 4;
  for(size_t cpuN=0; cpuN<cpus; ++cpuN)
  {
    cpu_set_t set;
    CPU_ZERO(&set);
    CPU_SET(cpuN, &set);
    sched_setaffinity(0, sizeof(set), &set);

    tSt = omp_get_wtime();
    do_some_work();
    tFn = omp_get_wtime();
    cout << cpuN << " cpu - work done for " << (tFn-tSt) << " s." << endl;
  }

//third do_work block
  tSt = omp_get_wtime();
  do_some_work();
  tFn = omp_get_wtime();
  cout << "3rd work done for " << (tFn-tSt) << " s." << endl;
}

实际上我在OpenMP中使用它并运行并行。这是重现问题的简化版本。

输出如下：

第一项工作完成0.100秒。

0 cpu - 工作时间为0.200秒。

1个cpu - 工作完成0.200秒。

2 cpu - 工作时间为0.200秒。

3 cpu - 工作时间为0.200秒。

完成第三项工作0.200秒。

但有时看起来像：

第一项工作完成0.100秒。

0 cpu - 工作时间为0.200秒。

1个cpu - 工作完成0.200秒。

2 cpu - 为 0.100 s完成工作。

3 cpu - 工作时间为0.200秒。

完成第三项工作0.200秒。

所以我的问题是：

1）当我将线程绑定到某个CPU时，为什么执行时间会急剧增加？

2）为什么第3次通话的执行时间仍然不好？

3）为什么有时它是正常的（它可能发生在任何cpu上）？

加入。可执行的例子：

#include <iostream>
#include <omp.h>
#include <sched.h>
#include <fftw3.h>

using namespace std;

int main()
{
  size_t N = 100000;

  fftwf_complex *in = (fftwf_complex*) fftwf_malloc(sizeof(fftwf_complex) * N);
  fftwf_complex *out = (fftwf_complex*) fftwf_malloc(sizeof(fftwf_complex) * N);

  fftwf_plan fft = fftwf_plan_dft_1d(N, in, out, FFTW_FORWARD, FFTW_MEASURE);


  double tSt;
  double tFn;

//first do_work block
  tSt = omp_get_wtime();
  fftwf_execute(fft);
  tFn = omp_get_wtime();
  cout << "1st work done for " << (tFn-tSt) << " s." << endl;

//second do_work block
  // int cpus = omp_get_num_procs();
  int cpus = 4;
  for(size_t cpuN=0; cpuN<cpus; ++cpuN)
  {
    cpu_set_t set;
    CPU_ZERO(&set);
    CPU_SET(cpuN, &set);
    sched_setaffinity(0, sizeof(set), &set);

    tSt = omp_get_wtime();
    fftwf_execute(fft);
    tFn = omp_get_wtime();
    cout << cpuN << " cpu - work done for " << (tFn-tSt) << " s." << endl;
  }

//third do_work block
  tSt = omp_get_wtime();
  fftwf_execute(fft);
  tFn = omp_get_wtime();
  cout << "3rd work done for " << (tFn-tSt) << " s." << endl;

  fftwf_free(in);
  fftwf_free(out);
  fftwf_destroy_plan(fft);
}

编译：g ++ -std = c ++ 14 -O3 test.cpp -lfftw3f -fopenmp

输出（Xeon E5-2695 v4）：

第一项工作为0.000572872 s。

0 cpu - 为0.00124588 s完成工作。

1 cpu - 完成工作0.00131525 s。

2 cpu - 为0.00131468 s完成工作。

3 cpu - 为0.00129492 s完成的工作。

完成第3项工作0.00125334 s。

输出（i7-6500u）：

第一项工作为0.000400588 s。

0 cpu - 为0.00067104 s完成工作。

1 cpu - 为0.000504862 s完成工作。

2 cpu - 为0.000536624 s完成工作。

3 cpu - 为0.000439311 s完成工作。

为0.000432347做了第3次工作。

sched_setaffinity（）使用fftw3

0 个答案: