以下面的代码为例:
void func(void) {
#pragma omp parallel for
for (int i = 0; i < 4; i++) {
printf("%s: %d\n", __func__, omp_get_thread_num());
}
}
int main(void) {
#pragma omp parallel for
for (int i = 0; i < 2; i++) {
printf("%s: %d\n", __func__, omp_get_thread_num());
func();
}
return 0;
}
我希望main
函数生成2
func
个线程,并且在每个func
线程中,它将生成另一个3
个线程。所以完全会有8
个主题。但是运行上面的程序:
$ ./a.out
main: 1
main: 0
func: 0
func: 0
func: 0
func: 0
func: 0
func: 0
func: 0
func: 0
它表示只创建了外部2
个线程。我尝试使用collapse
:
void func(void) {
#pragma omp parallel for
for (int i = 0; i < 4; i++) {
printf("%s: %d\n", __func__, omp_get_thread_num());
}
}
int main(void) {
#pragma omp parallel for collapse(2)
for (int i = 0; i < 2; i++) {
printf("%s: %d\n", __func__, omp_get_thread_num());
func();
}
return 0;
}
编译器提出以下抱怨:
parallel.c: In function ‘main’:
parallel.c:15:3: error: not enough perfectly nested loops before ‘printf’
printf("%s: %d\n", __func__, omp_get_thread_num());
^~~~~~
所以collapse
应该仅适用于以下场景:
#pragma omp parallel for collapse(2)
for (int i = 0; i < 2; i++) {
for (int i = 0; i < 4; i++) {
printf("%s: %d\n", __func__, omp_get_thread_num());
}
}
是否有任何方法让嵌套函数并行运行?