Question

我已经实现了MiniMax算法（带有alpha-beta修剪），但是它的表现方式很有趣。我的玩家会创造巨大的领先优势，但是当需要进入决赛时，获胜的举动并不会成功，只会不断拖累游戏。

这是我的minimax函数：

// Game states are represented by Node objects (holds the move and the board in that state)
//ValueStep is just a pair holding the minimax value and a game move (step) 

private ValueStep minimax(Node gameState,int depth,int alpha,int beta) {

  //Node.MAXDEPTH is a constant
  if(depth == Node.MAXDEPTH || gameOver(gameState.board)) {
      return new ValueStep(gameState.heuristicValue(),gameState.step);
  }

  //this method definately works. child nodes are created with a move and an 
  //updated board and MAX value
  //which determines if they are the maximizing or minimizing players game states.
  gameState.children = gameState.findPossibleStates();

  if(state.MAX) { //maximizing player
      ValueStep best = null;

      for(Node child: gameState.children) {

          ValueStep vs = new ValueStep(minimax(child,depth+1,alpha,beta).value,child.move);

          //values updated here if needed
          if(best==null || vs.value > best.value) best = vs;

          if(vs.value > alpha) alpha = vs.value;

          if(alpha >= beta) break;
      }

      return best;

  } else { //minimizing player
      ValueStep best = null;

      for(Node child: gameState.children) {

          ValueStep vs = new ValueStep(minimax(child,depth+1,alfa,beta).value,child.move);

          if(best==null || vs.value < best.value) best = vs;

          if(vs.value < beta) beta = vs.value;

          if(alpha >= beta) break;
      }

      return best;
  }

}

首先，我认为问题出在我的评估功能上，但如果是，我找不到它。在这个游戏中，两个玩家都有得分，而我的功能只是根据得分差异计算启发式值。在这里：

public int heuristicValue() {

       //I calculate the score difference here in this state and save it in 
       //the variable scoreDiff. scoreDiff will be positive if I am winning 
       //here, negative if im loosing.

        //"this" is a Node object here. If the game is over here, special
        //heuristic values are returned, depending on who wins (or if its a 
        //draw) 
        if(gameOver(this.board)) {
            if(scoreDiff>0) {
                return Integer.MAX_VALUE;  
            } else if(scoreDiff==0) {
                return 0;
            } else {
                return Integer.MIN_VALUE;
            }
        }

        int value = 0;
        value += 100*scoreDiff; //caluclate the heuristic value using the score differerence. If its high, the value will be high as well 

      return value;
  }

我已将我的代码“翻译”为英语，因此可能会有错别字。我非常确定问题出在这里，但是如果您需要其他代码，那么我将更新问题。同样，我的玩家可以创造优势，但是由于某种原因，它不会使最终获胜的举动。感谢您的帮助！

Answer 1

假设您的Minimax玩家处于可以证明自己可以保证获胜的位置。通常仍然可以通过许多不同的方式来保证最终的胜利。有些举动可能是即时获胜，有些举动可能会不必要地拖累游戏……只要这不是一个愚蠢的举动突然让对手获胜（或平局），它们都是胜利，而且都具有相同的优势博弈论价值（代码中的import {Subject} from 'rxjs' export default class EventObserver { constructor() { this.subject = new Subject() } addToStream(event) { this.subject.onNext(event) } buildColumnState = (event) => { // resolve event and return an object return {m: 'ok'} } getObservableChanges() { return this.subject.pipe( switchMap(this.buildColumnState) ) } } // file.js import EventListener from './listners/EventObserver' // in side class constructor() { this.eventListner = new EventListener() this.listenForEvents() } // then // add some events to stream this.eventListner.addToStream(this) // listen for events listenForEvents(){ this.eventListner .getObservableChanges() ._trySubscribe((e) => { console.log(e) }) }）。

您的Minimax算法不会区分这些移动，而只是播放恰好是Integer.MAX_VALUE列表中第一移动的移动。那可能是一个快速的，微弱的胜利，或者可能是一个缓慢的，非常深的胜利。

有两种简单的方法可以使您的Minimax算法优先考虑快赢而不是慢赢：

最好的选择（因为它还具有许多其他优点）是使用迭代加深。您可以查找详细信息，但是基本思想是，首先进行深度限制为1的Minimax搜索，然后再进行深度限制为2的另一个最大深度搜索，依此类推。以此类推。您的一项搜索证明是胜利，则可以终止搜索并进行获胜的举动。这将使您的算法始终喜欢最短的获胜机会（因为先找到获胜者）。
或者，您可以修改gameState.children函数以合并搜索深度。例如，您可以在获胜位置返回heuristicValue()。实际上，这将使更快的胜利获得更大的评价。

MiniMax算法的一个非常有趣的问题。什么会导致这种行为？

1 个答案: