Question

我正在为游戏开发AI，我想将 MinMax 算法与 Alpha-Beta修剪一起使用。

我对它是如何工作有一个粗略的想法，但我仍然无法从头开始编写代码，因此我花了最近两天在网上寻找某种伪代码。

我的问题是，我在网上找到的每个伪代码似乎都是基于找到最佳移动的价值，而我需要返回最佳移动而不是数字。

我当前的代码基于此伪代码（source）

minimax(level, player, alpha, beta){  // player may be "computer" or "opponent"
    if (gameover || level == 0)
       return score
    children = all valid moves for this "player"
    if (player is computer, i.e., max's turn){
       // Find max and store in alpha
       for each child {
          score = minimax(level - 1, opponent, alpha, beta)
          if (score > alpha) alpha = score
          if (alpha >= beta) break;  // beta cut-off
       }
       return alpha
    } else (player is opponent, i.e., min's turn)
       // Find min and store in beta
       for each child {
          score = minimax(level - 1, computer, alpha, beta)
          if (score < beta) beta = score
          if (alpha >= beta) break;  // alpha cut-off
       }
       return beta
    }
}

// Initial call with alpha=-inf and beta=inf
minimax(2, computer, -inf, +inf)

正如您所看到的，此代码返回一个数字，我想这需要使一切正常（因为在递归过程中使用了返回的数字）。

所以我认为我可以使用外部变量存储最佳动作，这就是我改变之前代码的方式：

minimax(level, player, alpha, beta){  // player may be "computer" or "opponent"
    if (gameover || level == 0)
       return score
    children = all valid moves for this "player"
    if (player is computer, i.e., max's turn){
       // Find max and store in alpha
       for each child {
          score = minimax(level - 1, opponent, alpha, beta)
          if (score > alpha) {
              alpha = score
              bestMove = current child // ROW THAT I ADDED TO UPDATE THE BEST MOVE
          }
          if (alpha >= beta) break;  // beta cut-off
       }
       return alpha
    } else (player is opponent, i.e., min's turn)
       // Find min and store in beta
       for each child {
          score = minimax(level - 1, computer, alpha, beta)
          if (score < beta) beta = score
          if (alpha >= beta) break;  // alpha cut-off
       }
       return beta
    }
}

// Initial call with alpha=-inf and beta=inf
minimax(2, computer, -inf, +inf)

现在，这对我来说是有意义的，因为我们需要更新最佳动作，只有当玩家的回合以及此举是否优于之前。

所以，虽然我认为这是正确的（即使我不是100％肯定），source也有一个 java 实现更新bestMove score < beta即使在minmax(2, black, a, b)的情况下，我也不明白为什么。

尝试使用该实现导致我的代码选择最佳移动来自对立玩家的移动，这似乎是不正确的（假设我是黑人玩家，我正在寻找我可以做出最好的举动，所以我期待一个＆＃34;黑色＆＃34;移动，而不是一个＆＃34;白色＆＃34;一个。）

我不知道我的伪代码（第二个）是否是使用 MinMax 通过 alpha-beta修剪找到最佳移动的正确方法我需要更新最佳动作，即使在得分＆lt; beta 案例。

如果您愿意，请随意建议任何新的和更好的伪代码，我没有任何约束，如果它比我的更好，我不介意重写一些代码。

编辑：

由于我无法理解回复，我想这个问题可能并没有问我想知道什么，所以我试图在这里写得更好。

前提是我想只为一个玩家获得最佳移动，并且每次我需要时，最大化这个玩家会被传递到 MinMax 功能一个新的移动（以便minmax(2, white, a ,b)返回黑色玩家的最佳移动，而//PlayerType is an enum with just White and Black values, opponent() returns the opposite player type protected int minMax(int alpha, int beta, int maxDepth, PlayerType player) { if (!canContinue()) { return 0; } ArrayList<Move> moves = sortMoves(generateLegalMoves(player)); Iterator<Move> movesIterator = moves.iterator(); int value = 0; boolean isMaximizer = (player.equals(playerType)); // playerType is the player used by the AI if (maxDepth == 0 || board.isGameOver()) { value = evaluateBoard(); return value; } while (movesIterator.hasNext()) { Move currentMove = movesIterator.next(); board.applyMove(currentMove); value = minMax(alpha, beta, maxDepth - 1, player.opponent()); board.undoLastMove(); if (isMaximizer) { if (value > alpha) { selectedMove = currentMove; alpha = value; } } else { if (value < beta) { beta = value; } } if (alpha >= beta) { break; } } return (isMaximizer) ? alpha : beta; }返回白色玩家的最佳移动），你将如何更改第一个伪代码（或 java < / em>在源代码中实现）将这个给定的最佳移动存储在某个地方？

编辑2：

让我们看看我们是否可以这样做。

这是我的实施，请您告诉我它是否正确？

private class MoveValue { public Move move; public int value; public MoveValue() { move = null; value = 0; } public MoveValue(Move move, int value) { this.move = move; this.value = value; } @Override public String toString() { return "MoveValue{" + "move=" + move + ", value=" + value + '}'; } } protected MoveValue minMax(int alpha, int beta, int maxDepth, PlayerType player) { if (!canContinue()) { return new MoveValue(); } ArrayList<Move> moves = sortMoves(generateLegalMoves(player)); Iterator<Move> movesIterator = moves.iterator(); MoveValue moveValue = new MoveValue(); boolean isMaximizer = (player.equals(playerType)); if (maxDepth == 0 || board.isGameOver()) { moveValue.value = evaluateBoard(); return moveValue; } while (movesIterator.hasNext()) { Move currentMove = movesIterator.next(); board.applyMove(currentMove); moveValue = minMax(alpha, beta, maxDepth - 1, player.opponent()); board.undoLastMove(); if (isMaximizer) { if (moveValue.value > alpha) { selectedMove = currentMove; alpha = moveValue.value; } } else { if (moveValue.value < beta) { beta = moveValue.value; selectedMove = currentMove; } } if (alpha >= beta) { break; } } return (isMaximizer) ? new MoveValue(selectedMove, alpha) : new MoveValue(selectedMove, beta); }

编辑3：

基于@ Codor的回答/评论的新实施

minMax(Integer.MIN_VALUE, Integer.MAX_VALUE, 1, PlayerType.Black)

我不知道我是否做对了，或者我做错了什么，但我回到了发布问题时遇到的问题：

调用{{1}}会返回一个只能由白方玩家完成的移动，这不是我需要的。

我需要为给定的球员提供最佳动作，而不是整板的最佳动作。

Answer 1

经过一些研究和大量时间浪费在解决这个问题后，我想出了这个似乎有用的解决方案。

private class MoveValue {

    public double returnValue;
    public Move returnMove;

    public MoveValue() {
        returnValue = 0;
    }

    public MoveValue(double returnValue) {
        this.returnValue = returnValue;
    }

    public MoveValue(double returnValue, Move returnMove) {
        this.returnValue = returnValue;
        this.returnMove = returnMove;
    }

}


protected MoveValue minMax(double alpha, double beta, int maxDepth, MarbleType player) {       
    if (!canContinue()) {
        return new MoveValue();
    }        
    ArrayList<Move> moves = sortMoves(generateLegalMoves(player));
    Iterator<Move> movesIterator = moves.iterator();
    double value = 0;
    boolean isMaximizer = (player.equals(playerType)); 
    if (maxDepth == 0 || board.isGameOver()) {            
        value = evaluateBoard();            
        return new MoveValue(value);
    }
    MoveValue returnMove;
    MoveValue bestMove = null;
    if (isMaximizer) {           
        while (movesIterator.hasNext()) {
            Move currentMove = movesIterator.next();
            board.applyMove(currentMove);
            returnMove = minMax(alpha, beta, maxDepth - 1, player.opponent());
            board.undoLastMove();
            if ((bestMove == null) || (bestMove.returnValue < returnMove.returnValue)) {
                bestMove = returnMove;
                bestMove.returnMove = currentMove;
            }
            if (returnMove.returnValue > alpha) {
                alpha = returnMove.returnValue;
                bestMove = returnMove;
            }
            if (beta <= alpha) {
                bestMove.returnValue = beta;
                bestMove.returnMove = null;
                return bestMove; // pruning
            }
        }
        return bestMove;
    } else {
        while (movesIterator.hasNext()) {
            Move currentMove = movesIterator.next();
            board.applyMove(currentMove);
            returnMove = minMax(alpha, beta, maxDepth - 1, player.opponent());
            board.undoLastMove();
            if ((bestMove == null) || (bestMove.returnValue > returnMove.returnValue)) {
                bestMove = returnMove;
                bestMove.returnMove = currentMove;
            }
            if (returnMove.returnValue < beta) {
                beta = returnMove.returnValue;
                bestMove = returnMove;
            }
            if (beta <= alpha) {
                bestMove.returnValue = alpha;
                bestMove.returnMove = null;
                return bestMove; // pruning
            }
        }
        return bestMove;
    }   
}

Answer 2

这有点困难，因为给定的代码不是实际的Java实现;为了达到你想要的效果，必须有具体的类型来表示游戏树中的移动和位置。通常游戏树没有显式编码，而是以稀疏表示方式导航，其中实现将实际执行有问题的移动，递归地评估得到的较小问题并撤消移动，从而通过使用调用堆栈使用depth-first search代表当前的路径。

要获得实际的最佳移动，只需从您的方法返回实例，从而最大化后续评估。首先实现Minimax algorithm而不是alpha-beta-pruning可能会有所帮助，{{3}}在基本结构工作后的后续步骤中添加。

问题中链接的实现（第1.5节）实际上返回了最佳移动，如下面的评论中所示。

/** Recursive minimax at level of depth for either
    maximizing or minimizing player.
    Return int[3] of {score, row, col}  */

此处没有用户定义的类型用于表示移动，但该方法返回三个值，即评估的最佳分数和玩家移动到实际执行最佳移动的坐标（实现已经具有完成获得得分），这是实际移动的表示。

使用带有Alpha-Beta修剪的MinMax找到最佳移动

2 个答案: