我正在编写一个简单的基准测试应用程序,以评估/比较pthreads和OpenMP的多线程性能。为此,我使用了蒙特卡洛方法来计算Pi,并将迭代分布在多个线程上。
我围绕pthreads接口编写了一个小型且简单的包装器类,该类仅在构造对象并创建线程ID时创建线程。
现在解决我的问题:我发现,如果产生多个线程,则似乎无法执行该线程内部的for循环。我已经通过在循环内部增加一个变量(对于所有线程,但最后一个线程)保持零来进行测试。奇怪的是,这种情况不会在程序的每次运行中发生。 在创建线程之间增加延迟可以解决该问题,但是我真的不明白为什么。
void* monteCarlo_intern(void* args){
struct thread_arguments* arguments = (struct thread_arguments*) args;
long unsigned int total=0;
long unsigned int inside=0;
thread_local std::mt19937_64 generator(std::random_device{}());
thread_local std::uniform_real_distribution<double> distribution(0.0, 1.0);
double d1 = 0, d2 = 0;
for(total = 0; total < arguments -> iterations; total++) {
d1 = distribution(generator);
d2 = distribution(generator);
if((d1*d1 + d2*d2) <= 1.0) {inside++;}
}
*arguments->inside = inside * 4;
}
double calcPI_monteCarlo(struct for_arguments args_in){
std::vector <long unsigned int> inside_vec(args_in.no_threads - 1, 0); //Vector to hold results per thread
std::vector<Thread> threads; //Vector to hold threads
const unsigned long int iters_per_t = args_in.iterations / args_in.no_threads;
std::vector<thread_arguments> args(args_in.no_threads - 1, {NULL, iters_per_t});
for (unsigned int i = 0; i < args_in.no_threads - 1; i++)
{
args[i].inside = &inside_vec[i]; //Assign results-vector adress to thread to capture result
threads.emplace_back(monteCarlo_intern, &args[i]); //Create thread and store it in vector
}
long unsigned int inside = 0;
thread_arguments local_args = {&inside, iters_per_t};
monteCarlo_intern(&local_args); //Do the same calculation locally to reduce number of spawned threads
for (unsigned int i = 0; i < args_in.no_threads - 1; i++)
{
pthread_join(threads[i].get_id(), NULL);
inside += inside_vec[i];
}
long unsigned int total = iters_per_t * args_in.no_threads;
return (double)inside / total;
}