我一直在努力学习minimax算法,我偶然发现了一个我无法弄清楚如何解决的错误。 代码:
private List<Integer> generatemoves(int[] evalFields) {
List<Integer> nextMoves = new ArrayList<Integer>();
for (int i = 0; i < evalFields.length; i++) {
if (evalFields[i] == 0) {
nextMoves.add(i);
}
}
return nextMoves;
}
private int evaluateLine(int p1, int p2, int p3, int[] evalFields) {
int score = 0;
if (evalFields[p1] == 1) {
score = 1;
} else if (evalFields[p1] == 10) {
score = -1;
}
if (evalFields[p2] == 1) {
if (score == 1) {
score = 10;
} else if (score == -1) {
return 0;
} else {
score = 1;
}
} else if (evalFields[p2] == 10) {
if (score == -1) {
score = -10;
} else if (score == 1) {
return 0;
} else {
score = -1;
}
}
if (evalFields[p3] == 1) {
if (score > 0) {
score *= 10;
} else if (score < 0) {
return 0;
} else {
score = 1;
}
} else if (evalFields[p3] == 10) {
if (score < 0) {
score *= 10;
} else if (score > 1) {
return 0;
} else {
score = -1;
}
}
return score;
}
private int evaluateBoard(int [] evalFields) {
int score = 0;
score += evaluateLine(0, 1, 2, evalFields);
score += evaluateLine(3, 4, 5, evalFields);
score += evaluateLine(6, 7, 8, evalFields);
score += evaluateLine(0, 3, 6, evalFields);
score += evaluateLine(1, 4, 7, evalFields);
score += evaluateLine(2, 5, 8, evalFields);
score += evaluateLine(0, 4, 8, evalFields);
score += evaluateLine(2, 4, 6, evalFields);
return score;
}
private int bestMove(int currentTurn, int[] board) {
int move;
int bestScore;
if (currentTurn == 1) {
bestScore = Integer.MIN_VALUE;
} else {
bestScore = Integer.MAX_VALUE;
}
List<Integer> nextMoves = generatemoves(board);
List<Integer> bestScores = new ArrayList<Integer>();
for (int i = 0; i < nextMoves.size(); i++) {
int[] newBoards = new int[9];
for (int j = 0; j < board.length; j++) {
newBoards[j] = board[j];
}
newBoards[nextMoves.get(i)] = turn;
bestScores.add(evaluateBoard(newBoards));
}
for (int scores : bestScores) {
if (currentTurn == 1) {
if (scores > bestScore) bestScore = scores;
} else {
if (scores < bestScore) bestScore = scores;
}
}
move = nextMoves.get(bestScores.indexOf(bestScore));
return move;
}
这是代码中最相关的部分。它的作用或我认为它的作用是它从电路板产生所有可能的移动,称为字段。然后它计算每个动作的分数。然后它继续进行移动,导致得分最高或最低,x(1)试图获得最高,O(10)最低。发生的错误是,当玩家开始并在中间占据场地时,ai正常行动但在玩家第二次转弯后ai开始变得奇怪:
[ ][ ][ ] [O][ ][ ] [O][ ][O]
[ ][x][ ] => [ ][x][ ] => [x][x][ ]
[ ][ ][ ] [ ][ ][ ] [ ][ ][ ]
如果玩家选择了这个:
[O][ ][ ] [O][ ][ ]
[ ][x][x] => [O][x][x]
[ ][ ][ ] [ ][ ][ ]
然后ai正常行动。 我不知道出了什么问题,或者即使我已经正确理解了minimax算法。
**** ****编辑 添加此代码仍有相同的问题
private int[] evaluateMove(int [] board, int currentTurn) {
int bestScore;
int currentScore;
int bestMove = -1;
if (currentTurn == 1) {
bestScore = Integer.MIN_VALUE;
} else {
bestScore = Integer.MAX_VALUE;
}
List<Integer> nextMoves = generatemoves(board);
if (nextMoves.isEmpty()) {
bestScore = evaluateTheBoard(board);
} else {
for (int move : nextMoves) {
int[] nextBoard = new int[9];
for (int i = 0; i < nextBoard.length; i ++) {
nextBoard[i] = board[i];
}
nextBoard[move] = currentTurn;
currentScore = evaluateMove(nextBoard, nextTurn())[0];
if (currentTurn == 1) {
if (currentScore > bestScore) {
bestScore = currentScore;
bestMove = move;
}
} else {
if (currentScore < bestScore) {
bestScore = currentScore;
bestMove = move;
}
}
}
}
return new int[] {bestScore, bestMove};
}
答案 0 :(得分:0)
我认为你误解了如何在这样的游戏中展望未来。不要总计&#39; evaluateLine
返回的值。
这是tic-tac-toe板的minimax得分的伪代码(evaluateBoard
应返回的内容)。请注意,evaluateBoard
需要有currentTurn
的概念。
function evaluateBoard(board, currentTurn)
// check if the game has already ended:
if WhiteHasWon then return -10
if BlackHasWon then return +10
// WhiteHasWon returns true if there exists one or more winning 3-in-a-row line for white.
// (You will have to scan for all 8 possible 3-in-a-row lines of white pieces)
// BlackHasWon returns true if there exists one or more winning 3-in-a-row line for black
if no legal moves, return 0 // draw
// The game isn't over yet, so look ahead:
bestMove = notset
resultScore = notset
for each legal move i for currentTurn,
nextBoard = board
Apply move i to nextBoard
score = evaluateBoard(nextBoard, NOT currentTurn).score
if score is <better for currentTurn> than resultScore, then
resultScore = score
bestMove = move i
return (resultScore, bestMove)
这个与您的版本和我的版本之间的一个非常关键的区别是我的版本递归。你的只有一层深。我从evaluateBoard
内部调用evaluateBoard
,如果我们不小心,这将是一个无限循环(一旦董事会填满,它就不会更深入,所以它&#39} ;实际上并不是无限的)
另一个不同之处在于,当它不应该是你的东西。从tic-tac-toe得到的分数是-10,0,或者只有在你看完比赛结束时才10分。你应该在那个时候为那个玩家选择最好的移动,并完全忽略所有其他可能性,因为你只关心最好的&#34;比赛线。游戏分数等于最佳游戏的结果。
在minimax中扩展<better for currentTurn>
是混乱的,这就是为什么negamax更清洁。白色喜欢低分,黑色喜欢高分,所以你需要一些if语句来选择合适的首选分数。你已经有了这个部分(在你最好的移动代码的最后),但它需要在递归内而不是在最后进行评估。