Question

我在我的连接四游戏中实现了一个搜索算法，我现在尝试使用OpenMP并行化并将电路板减少到4x4。

执行programm单核需要45s。每个并行性能都是48s-70s，所以我猜我做错了。

我在主要的“get_ai_turn”函数中启动并行块：

#pragma parallel private (f, g, c, i)
{
algorithm(f, g);
}

“算法”功能如下所示

for(i=0;i<COLS;i++) {
        make_turn(i);
        if(no winner and board not full) {
            algorithm(f, g);
            undo_turn(g);
        }
        else if(there is a winner) {
            undo_turn(g);
        }
        else if(no winner and board full)
            undo_turn(g);
        }
}

我将功能简化为最重要的部分，以便于阅读。我目前的尝试看起来像这样：

#pragma omp single
#pragma omp task
for(i=0;i<COLS;i++) {
        make_turn(i);
        if(no winner and board not full) {
            algorithm(f, g);
            undo_turn(g);
        }
        else if(there is a winner) {
            undo_turn(g);
        }
        else if(no winner and board full)
            undo_turn(g);
        }
}

我想最好的方法是只使用4个线程，这会分配第一个循环 - 实现这个的最佳方法是什么？我不确定我是否采取正确的方式。

编辑：

for(i=0;i<COLS;i++) {
 #pragma omp task final
        make_turn(i);
        if(no winner and board not full) {
            algorithm(f, g);
            undo_turn(g);
        }
        else if(there is a winner) {
            undo_turn(g);
        }
        else if(no winner and board full)
            undo_turn(g);
        }
}

所以我按照提到的方式做了这个，但我的程序越来越慢了：s 尝试使用final（4）和final（8），而数字等于使用的线程数。

我在并行块的末尾添加了#pragma omp task wait。

OpenMP - for-loop with recursive call - c

0 个答案: