我正在玩多线程,通过使用顺序部分求解二次方程式进行比较,并使用OpenMP API进行多次实现。
对于第一个并行版本,我只是使用线程ID分发执行加载:
delta = 0;
start = clock();
#pragma omp parallel num_threads(P) shared(x1, x2, b, a, c) private(delta)
{
int threadID = omp_get_thread_num();
for (int i = threadID; i < N; i += P)
{
delta = b[i] * b[i] - 4 * a[i] * c[i];
if (delta >= 0)
{
x1[i] = (-b[i] + sqrt(delta)) / (2 * a[i]);
x2[i] = (-b[i] - sqrt(delta)) / (2 * a[i]);
}
}
}
stop = clock();
durata_par = (double)(stop - start) / CLOCKS_PER_SEC;
printf("P_V1 %2.10f seconds\n", durata_par);
printf("P_V1 FA=%2.2f\n", durata_secv / durata_par);
printf("P_V1 E(%d)=%2.2f\n", P, (durata_secv / durata_par) / P);
然后我尝试使用#pragma omp分配循环迭代。
delta = 0;
start = clock();
#pragma omp parallel num_threads(P) shared(x1, x2, b, a, c) private(delta)
{
int threadID = omp_get_thread_num();
int numberofThreads = omp_get_num_threads();
if (threadID == 0)
{
std::cout << "Number of threads: " << numberofThreads << std::endl;
}
#pragma omp for
for (int i = 0; i < N; i++)
{
delta = b[i] * b[i] - 4 * a[i] * c[i];
if (delta >= 0)
{
x1[i] = (-b[i] + sqrt(delta)) / (2 * a[i]);
x2[i] = (-b[i] - sqrt(delta)) / (2 * a[i]);
}
}
}
stop = clock();
durata_par = (double)(stop - start) / CLOCKS_PER_SEC;
printf("P_V2 %2.10f seconds\n", durata_par);
printf("P_V2 FA=%2.2f\n", durata_secv / durata_par);
printf("P_V2 E(%d)=%2.2f\n", P, (durata_secv / durata_par) / P);
到目前为止一直很好,但我注意到的是,如果我将值赋值移除到delta(delta = 0,从第二个版本的顶部),则执行时间会增加,因此放大率会下降因子(FA)为4.4至1.4。
这种行为有解释吗? (应该没有区别,因为delta被声明为私有,并且无论如何都将在每个线程中重新声明)