Question

我正在编写一个简单的基准测试应用程序，以评估/比较pthreads和OpenMP的多线程性能。为此，我使用了蒙特卡洛方法来计算Pi，并将迭代分布在多个线程上。

我围绕pthreads接口编写了一个小型且简单的包装器类，该类仅在构造对象并创建线程ID时创建线程。

现在解决我的问题：我发现，如果产生多个线程，则似乎无法执行该线程内部的for循环。我已经通过在循环内部增加一个变量（对于所有线程，但最后一个线程）保持零来进行测试。奇怪的是，这种情况不会在程序的每次运行中发生。在创建线程之间增加延迟可以解决该问题，但是我真的不明白为什么。

void* monteCarlo_intern(void* args){
    struct thread_arguments* arguments = (struct thread_arguments*) args;

    long unsigned int total=0;
    long unsigned int inside=0;

        thread_local std::mt19937_64 generator(std::random_device{}());
    thread_local std::uniform_real_distribution<double> distribution(0.0, 1.0);

    double d1 = 0, d2 = 0;
    for(total = 0; total < arguments -> iterations; total++) {
                d1 = distribution(generator);
                d2 = distribution(generator);

        if((d1*d1 + d2*d2) <= 1.0) {inside++;}
    }
    *arguments->inside = inside * 4;
}

double calcPI_monteCarlo(struct for_arguments args_in){

  std::vector <long unsigned int> inside_vec(args_in.no_threads - 1, 0);  //Vector to hold results per thread
  std::vector<Thread> threads;  //Vector to hold threads
  const unsigned long int iters_per_t = args_in.iterations / args_in.no_threads;

  std::vector<thread_arguments> args(args_in.no_threads - 1, {NULL, iters_per_t});

  for (unsigned int i = 0; i < args_in.no_threads - 1; i++)
  {
    args[i].inside = &inside_vec[i];                        //Assign results-vector adress to thread to capture result
    threads.emplace_back(monteCarlo_intern, &args[i]);      //Create thread and store it in vector
  }

  long unsigned int inside = 0;
  thread_arguments local_args = {&inside, iters_per_t};
  monteCarlo_intern(&local_args);    //Do the same calculation locally to reduce number of spawned threads

  for (unsigned int i = 0; i < args_in.no_threads - 1; i++)
  {
    pthread_join(threads[i].get_id(), NULL);
    inside += inside_vec[i];
  }

  long unsigned int total = iters_per_t * args_in.no_threads;
  return (double)inside / total;
}

多线程蒙特卡洛pi计算会因为更多线程而失去精度？

0 个答案: