Question

我有一个更大的项目，我不会在这里发布代码，因为对于用c ++编写的一篇文章来说，代码太多了。这是一款跳棋AI，它使用Minimax和我自己设计的评估功能来找到跳棋游戏的最佳动作。大约一个月来，我一直在努力获得意想不到的结果。

问题是使用alpha-beta修剪实现minimax。我同时实现了minimax和minimax，同时实现了alpha-beta修剪，但是当我将两个AI相互对抗时，不带有alpha-beta修剪的AI胜过带有alpha-beta修剪的AI。我最近意识到，如果我关闭编译器优化，则两个AI相互之间会获得相对相同的收益。

仅在优化关闭或认为样式不好时，程序才能达到我想要的结果吗？

我的测试是使用一种AI进行500场比赛，而另一种AI则交替进行。

我可以发布代码，但是它似乎不太适合这个问题。

编辑：我已经看到了很多有关使用valgrind和打开警告的评论。我已经完成了这两项工作，但是我的代码中的问题也从未导致程序崩溃，而且我很确定我的主要内存泄漏已得到修复。我可以使程序连续运行大约一周而不崩溃（它在一周后没有崩溃，但是自然结束了）。问题永远不会是运行时错误，因为该程序赢得胜利的频率比应有的少约50％。

/**
 * Minimax with alpha-beta pruning. Alternate between the maximizing and
 * minimizing players move through a tree of nodes to return the most
 * favorable move to make assuming the opponent also makes the best moves.
 * Uses alpha-beta pruning to cut off subtrees which will not need to be
 * evaluated.
 * @param node the head of the tree
 * @param depth the depth in the tree to pursue
 * @param maximizingPlayer a boolean which should be true when called
 * @param alpha the alpha cut off value when called this should be
 *        -infinity
 * @param beta the alpha cut off value when called this should be
 *        +infinity
 * @return the value of the best node to choose which will be found in the
 *        successors of the head
 */
int AI::minimaxAB(Node *node, int depth, bool maximizingPlayer,
        int alpha, int beta) {
    int returnValue;
    if (depth == 0 || node->isTerminal()) {
        // we have reached our target depth or the end of the game 
        // so evaluate the board
        returnValue = evaluateBoardState(node->getBoardState());
        node->setValue(returnValue);
        return returnValue;
    }

    auto *successors = node->getSuccessors();
    // each time minimax is recursively called it returns the node from the
    // params with the best value from its successors as its value
    if (maximizingPlayer) {
        // set value to something lower than is possible in the game
        returnValue = MIN;
        // set the curBest to something to be overwritten
        node->setValue(returnValue);
        for (auto &n : *successors) {
            returnValue = max(node->getValue(),
                    minimaxAB(n, depth - 1, false, alpha, beta));
            node->setValue(returnValue);
            alpha = max(alpha, returnValue);

            // if the alpha our current value is greater than the min break
            if (beta <= alpha)
                break; // causes worse moves to be chosen
        }
        return returnValue;

    } else { // minimizing player
        returnValue = MAX;
        // set the curBest to something to be overwritten
        node->setValue(returnValue);
        for (auto &n : *successors) {
            // Compare the new minimax node to the last one
            returnValue = min(node->getValue(),
                    minimaxAB(n, depth - 1, true, alpha, beta));
            node->setValue(returnValue);
            beta = min(beta, returnValue);

            // if the alpha our current value is greater than the min break
            if (beta <= alpha)
                break; // causes worse moves to be chosen
        }
        return returnValue;
    }
}

MIN和MAX是常量整数，其值大于或小于评估函数可以达到的值。 min和max函数返回两个整数中的最大值。

Answer 1

不好吗？是的。

有很多原因会导致您错过性能。通常，这是使用C ++的最引人注目的原因之一。您可以编写由于优化而消失的抽象。

但是，更重要的是，您正在使用未定义的行为。这意味着您的代码将在以下情况下中断：优化，切换编译器，切换编译器版本或什至第二次运行它。

由于您尚未指定编译器，因此我给Clang和Gcc提供了一个提示：使用-fsanitize=ubsan编译程序。这将检测您的exe，并告诉您编译器所依赖的UB。

Answer 2

是的，这很糟糕。您的程序显然包含未定义的行为。

程序中任何未定义的行为都是错误。您不希望代码中出现这些错误。消除它们。

该怎么办？

首先，提高编译器警告级别，并修复编译器抱怨的问题。这是减少程序中未定义行为的最简单方法，并且可以捕获优化程序将利用的大多数内容。

同样重要的是，使用valgrind 运行您的应用。这将捕获大多数与内存相关的错误。

Answer 3

您的AI是否以任何方式合并运行时测量？

例如，如果您进行广度优先搜索，并继续探索游戏状态树，直到达到时间限制。或者，如果您使用时间戳在同一程序中多次为RNG植入种子，那么经过的时间将影响RNG。

如果结果取决于时序，那么即使没有不确定的行为，其行为也会因优化级别而有很大不同。（尽管在执行过程中重新播种RNG也是一个错误）。

我同意其他人的观点，即您观察到的行为是未定义行为的症状。但是确实存在其他解释，并且在AI中，这些解释实际上似乎足以值得一提。

如果我的程序仅在关闭编译器优化功能时才达到我想要的结果，这是否不好？

3 个答案: