首先我很抱歉略有不正确的标题,我只是不想让它长30个字。 当我将它应用到我的TicTacToe游戏时,我实施的alpha / beta修剪大大减少了评估量,请在下面自己查看。
每对评估计数都以与输入相同的游戏状态进行测量。
当我想要对我一直在研究的神经网络的跳棋实施修剪时,问题出现了。这是整个事情的目标,我刚刚开始使用TicTacToe游戏来试验MiniMax + Alpha / Beta,因为我之前从未处理过这些算法。
以下是与NN相同的实验。
现在对于代码(跳棋一个,让我知道如果你想看看TicTacToe版本,它们几乎完全相同)。
我只会在两个方法的开头粘贴一次,因为它们完全相同,我会显示两个签名,因为它们略有不同。
小编,使代码更清晰。
Board 是跟踪碎片,可用移动的对象, 如果游戏已赢得/抽出等等,那就转过来了......
移动是包含与移动相关的所有信息的对象 克隆作为方法的第一行我只是简单地克隆了 给定的板和构造函数将给定的移动应用于它。
private double miniMax(Board b, Move m, int depth) {
和
private double alphaBeta(Board b, Move m, int depth, double alpha, double beta) {
两种方法的开头:
Testboard clone = new Testboard(b, m);
// Making a clone of the board in order to
// avoid making changes to the original one
if (clone.isGameOver()) {
if (clone.getLoser() == null)
// It's a draw, evaluation = 0
return 0;
if (clone.getLoser() == Color.BLACK)
// White (Max) won, evaluation = 1
return 1;
// Black (Min) won, evaluation = -1
return -1;
}
if (depth == 0)
// Reached the end of the search, returning current Evaluation of the board
return getEvaluation(clone);
常规MiniMax延续:
// If it's not game over
if (clone.getTurn() == Color.WHITE) {
// It's white's turn (Maxing player)
double max = -1;
for (Move move : clone.getMoves()) {
// For each children node (available moves)
// Their minimax value is calculated
double score = miniMax(clone, move, depth-1);
// Only the highest score is stored
if (score > max)
max = score;
}
// And is returned
return max;
}
// It's black's turn (Min player)
double min = 1;
for (Move move : clone.getMoves()) {
// For each children node (available moves)
// Their minimax value is calculated
double score = miniMax(clone, move, depth-1);
// Only the lowest score is stored
if (score < min)
min = score;
}
// And is returned
return min;
}
具有Alpha / Beta修剪延续的MiniMax:
// If it's not game over
if (clone.getTurn() == Color.WHITE) {
// It's white's turn (Maxing player)
for (Move move : clone.getMoves()) {
// For each children node (available moves)
// Their minimax value is calculated
double score = alphaBeta(clone, move, depth-1, alpha, beta);
if (score > alpha)
// If this score is greater than alpha
// It is assigned to alpha as the new highest score
alpha = score;
if (alpha >= beta)
// The cycle is interrupted early if the value of alpha equals or is greater than beta
break;
}
// The alpha value is returned
return alpha;
}
// It's black's turn (Min player)
for (Move move : clone.getMoves()) {
// For each children node (available moves)
// Their minimax value is calculated
double score = alphaBeta(clone, move, depth-1, alpha, beta);
if (score < beta)
// If this score is lower than beta
// It is assigned to beta as the new lowest score
beta = score;
if (alpha >= beta)
// The cycle is interrupted early if the value of alpha equals or is greater than beta
break;
}
// The beta value is returned
return beta;
}
老实说我坚持了,我不知道我能做些什么来试图弄清楚发生了什么。我已经在几个不同的甚至是随机生成的神经网络上尝试过MiniMax + A / B,但是在评估数量方面我从未见过改进。我希望有人能够对这种情况有所了解,谢谢!
答案 0 :(得分:0)
感谢@maraca帮助我解决这个问题,因为他回复了评论,所以回答自己。
我发布的代码没有任何问题,问题在于我在搜索达到深度限制时使用的评估功能。
我使用的是一个仍未训练的神经网络,基本上只是吐出随机值,这迫使MiniMax + A / B通过所有节点,因为没有与答案的一致性,结果证明是必要的修剪发生。