我被指派实施减少变量的想法而不使用减少子句。我设置了这个基本代码来测试它。
int i = 0;
int n = 100000000;
double sum = 0.0;
double val = 0.0;
for (int i = 0; i < n; ++i)
{
val += 1;
}
sum += val;
所以最后sum == n
。
每个线程都应该将val设置为私有变量,然后对sum的加法应该是线程收敛的关键部分,例如
int i = 0;
int n = 100000000;
double sum = 0.0;
double val = 0.0;
#pragma omp parallel for private(i, val) shared(n) num_threads(nthreads)
for (int i = 0; i < n; ++i)
{
val += 1;
}
#pragma omp critical
{
sum += val;
}
我无法弄清楚如何为临界区维护val的私有实例。我尝试用更大的pragma来包围整个事物,例如
int i = 0;
int n = 100000000;
double sum = 0.0;
double val = 0.0;
#pragma omp parallel private(val) shared(sum)
{
#pragma omp parallel for private(i) shared(n) num_threads(nthreads)
for (int i = 0; i < n; ++i)
{
val += 1;
}
#pragma omp critical
{
sum += val;
}
}
但我没有得到正确的答案。我应该如何设置pragma和子句来执行此操作?
答案 0 :(得分:4)
你的程序有很多缺陷。让我们看一下每个程序(缺陷写成注释)。
计划一个
int i = 0;
int n = 100000000;
double sum = 0.0;
double val = 0.0;
#pragma omp parallel for private(i, val) shared(n) num_threads(nthreads)
for (int i = 0; i < n; ++i)
{
val += 1;
}
// At end of this, all the openmp threads die.
// The reason is the "pragma omp parallel" creates threads,
// and the scope of those threads were till the end of that for loop. So, the thread dies
// So, there is only one thread (i.e. the main thread) that will enter the critical section
#pragma omp critical
{
sum += val;
}
计划二
int i = 0;
int n = 100000000;
double sum = 0.0;
double val = 0.0;
#pragma omp parallel private(val) shared(sum)
// pragma omp parallel creates the threads
{
#pragma omp parallel for private(i) shared(n) num_threads(nthreads)
// There is no need to create another set of threads
// Note that "pragma omp parallel" always creates threads.
// Now you have created nested threads which is wrong
for (int i = 0; i < n; ++i)
{
val += 1;
}
#pragma omp critical
{
sum += val;
}
}
最好的解决方案是
int n = 100000000;
double sum = 0.0;
int nThreads = 5;
#pragma omp parallel shared(sum, n) num_threads(nThreads) // Create omp threads, and always declare the shared and private variables here.
// Also declare the maximum number of threads.
// Do note that num_threads(nThreads) doesn't guarantees that the number of omp threads created is nThreads. It just says that maximum number of threads that can be created is nThreads...
// num_threads actually limits the number of threads that can be created
{
double val = 0.0; // val can be declared as local variable (for each thread)
#pragma omp for nowait // now pragma for (here you don't need to create threads, that's why no "omp parallel" )
// nowait specifies that the threads don't need to wait (for other threads to complete) after for loop, the threads can go ahead and execute the critical section
for (int i = 0; i < n; ++i)
{
val += 1;
}
#pragma omp critical
{
sum += val;
}
}
答案 1 :(得分:2)
您不需要在OpenMP中显式指定共享变量,因为默认情况下始终共享外部作用域中的变量(除非指定了default(none)
子句)。由于private
变量具有未定义的初始值,因此应在累积循环之前将私有副本归零。循环计数器被自动识别并变为私有 - 无需明确声明它们。此外,由于您只是更新一个值,因此您应该使用atomic
构造,因为它比完整的关键部分更轻量级。
int i = 0;
int n = 100000000;
double sum = 0.0;
double val = 0.0;
#pragma omp parallel private(val)
{
val = 0.0;
#pragma omp for num_threads(nthreads)
for (int i = 0; i < n; ++i)
{
val += 1;
}
#pragma omp atomic update
sum += val;
}
update
子句被添加到OpenMP 3.1中的atomic
构造中,因此如果您的编译器符合早期的OpenMP版本(例如,如果您使用仅支持OpenMP 2.0的MSVC ++,即使在VS2012中),您也可以必须删除update
子句。由于val
未在并行循环外使用,因此可以在内部作用域中声明,如在veda的答案中那样,然后它会自动变为私有变量。
请注意,parallel for
是嵌套两个OpenMP结构的快捷方式:parallel
和for
:
#pragma omp parallel for sharing_clauses scheduling_clauses
for (...) {
}
相当于:
#pragma omp parallel sharing_clauses
#pragma omp for scheduling_clauses
for (...) {
}
对于其他两个组合结构也是如此:parallel sections
和parallel workshare
(仅限Fortran)