在我的方法newminimax499中,我有一个使用memoization和alpha beta修剪的minimax算法。该方法适用于3x3游戏,但是当我玩4x4游戏时,我会为计算机选择奇怪的意外位置。他仍然没有输球,但他似乎没有赢得胜利。为了说明这里的问题,从3x3和4x4的2场比赛开始。首先是来自3x3游戏的场景,其中玩家是X并且是第一步:
这还不错,实际上它是人们期望计算机做的事情。现在来看看4x4游戏中的场景。 O再次是计算机而X开始:
正如你所看到的,计算机只是一个接一个地按系统顺序放置Os,并且只有当它有潜在的胜利时才打破该命令以阻止X.这是非常防守的比赛,不像在3x3比赛中看到的那样。那么为什么3x3和4x4的方法表现不同?
以下是代码:
//This method returns a 2 element int array containing the position of the best possible
//next move and the score it yields. Utilizes memoization and alpha beta
//pruning to achieve better performance.
public int[] newminimax499(int a, int b){
//int bestScore = (turn == 'O') ? +9 : -9; //X is minimizer, O is maximizer
int bestPos=-1;
int alpha= a;
int beta= b;
int currentScore;
//boardShow();
String stateString = "";
for (int i=0; i<state.length; i++)
stateString += state[i];
int[] oldAnswer = oldAnswers.get(stateString);
if (oldAnswer != null)
return oldAnswer;
if(isGameOver()!='N'){
int[] answer = {score(), bestPos};
oldAnswers.put (stateString, answer);
return answer;
}
else{
for(int x:getAvailableMoves()){
if(turn=='X'){ //X is minimizer
setX(x);
//System.out.println(stateID++);
currentScore = newminimax499(alpha, beta)[0];
revert(x);
if(currentScore<beta){
beta=currentScore;
bestPos=x;
}
if(alpha>=beta){
break;
}
}
else { //O is maximizer
setO(x);
//System.out.println(stateID++);
currentScore = newminimax499(alpha, beta)[0];
revert(x);
if(currentScore>alpha){
alpha=currentScore;
bestPos=x;
}
if(alpha>=beta){
break;
}
}
}
}
if(turn=='X'){
int[] answer = {beta, bestPos};
oldAnswers.put (stateString, answer);
return answer;
}
else {
int[] answer = {alpha, bestPos};
oldAnswers.put (stateString, answer);
return answer;
}
}
以下是您运行代码所需的其他组件和补充方法。 我的类State2中使用的字段和构造函数:
private char [] state; //Actual content of the board
private char turn; //Whose turn it is
private Map<String,int[]> oldAnswers; //Used for memoization. It saves every state along with the score it yielded which allows us to stop exploring the children of a certain node if a similar node's score has been previously calculated. The key is the board state(i.e OX------X for example), the int array is a 2 element array containing the score and position of last placed seed of the state.
private Map<Integer, int []> RowCol; //A mapping of positions from a board represented as a normal array to a board represented as a 2d array. For example: The position 0 maps to 0,0 on a 2d array board, 1 maps to 0,1 and so on.
private static int n; //Size of the board
private static int stateID; //An simple incrementer used to show number of recursive calls in the newminiax49 method.
private static int countX, countO; //Number of placed Xs and Os
private static int lastAdded; //Position of last placed seed
private char [][] DDState; //A 2d array representing the board. Contains the same values as state[]. Used for simplicity in functions that check the state of the board.
public State2(int n){
int a=0;
State2.n=n;
state=new char[n*n];
RowCol=new HashMap<Integer, int []>();
countX=0;
countO=0;
//Initializing the board with empty slots
for(int i = 0; i<state.length; i++){
state[i]='-';
}
//Mapping
for(int i=0; i<n; i++){
for(int j=0; j<n; j++){
RowCol.put(a, new int[]{i, j});
a++;
}
}
a=0;
DDState=new char[n][n];
//Initializing the 2d array with the values from state[](empty slots)
for(int i=0; i<n; i++){
for(int j=0; j<n; j++){
DDState[i][j]=state[a];
a++;
}
}
oldAnswers = new HashMap<String,int[]>();
}
补充方法:
getAvailableMoves,返回一个板上有空插槽的数组(即可能的下一步移动)。
public int[] getAvailableMoves(){
int count=0;
int i=0;
for(int j=0; j<state.length; j++){
if(state[j]=='-')
count++;
}
int [] availableSlots = new int[count];
for(int j=0; j<state.length; j++){
if(state[j]=='-')
availableSlots[i++]=j;
}
return availableSlots;
}
isGameOver2(),只需检查棋盘的当前状态是否游戏结束。返回一个字符'X','O','D'和'N',分别代表X赢,O赢,抽奖和不游戏。
public char isGameOver2(){
char turnOpp;
int count;
if(turn=='X'){
count=countO;
turnOpp='O';
}
else {
count=countX;
turnOpp='X';
}
if(count>=n){
for(int i=0; i<n; i++){
if(DDState[i][RowCol.get(lastAdded)[1]]!=turnOpp)
break;
if(i==(n-1)){
return turnOpp;
}
}
//Check row for win
for(int i=0; i<n; i++){
if(DDState[RowCol.get(lastAdded)[0]][i]!=turnOpp)
break;
if(i==(n-1)){
return turnOpp;
}
}
//Check diagonal for win
if(RowCol.get(lastAdded)[0] == RowCol.get(lastAdded)[1]){
//we're on a diagonal
for(int i = 0; i < n; i++){
if(DDState[i][i] != turnOpp)
break;
if(i == n-1){
return turnOpp;
}
}
}
//check anti diagonal
for(int i = 0; i<n; i++){
if(DDState[i][(n-1)-i] != turnOpp)
break;
if(i == n-1){
return turnOpp;
}
}
//check for draw
if((countX+countO)==(n*n))
return 'D';
}
return 'N';
}
boardShow,返回电路板当前状态的矩阵显示:
public void boardShow(){
if(n==3){
System.out.println(stateID);
for(int i=0; i<=6;i+=3)
System.out.println("["+state[i]+"]"+" ["+state[i+1]+"]"+" ["+state[i+2]+"]");
System.out.println("***********");
}
else {
System.out.println(stateID);
for(int i=0; i<=12;i+=4)
System.out.println("["+state[i]+"]"+" ["+state[i+1]+"]"+" ["+state[i+2]+"]"+" ["+state[i+3]+"]");
System.out.println("***********");
}
}
得分,是一个简单的评价函数,其中O赢取+10,X赢取-10,抽奖0:
public int score(){
if(isGameOver2()=='X')
return -10;
else if(isGameOver2()=='O')
return +10;
else
return 0;
}
种子制定者:
//Sets an X at a certain location and updates the turn, countX and lastAdded variables
public void setX(int i){
state[i]='X';
DDState[RowCol.get(i)[0]][RowCol.get(i)[1]]='X';
turn='O';
countX++;
lastAdded=i;
}
//Sets an O at a certain location and updates the turn, countO and lastAdded variables
public void setO(int i){
state[i]='O';
DDState[RowCol.get(i)[0]][RowCol.get(i)[1]]='O';
turn='X';
countO++;
lastAdded=i;
}
还原,只需恢复移动。例如,如果X已被放置在位置0,则revert(0)在其位置设置' - '并更新由setX更改的变量:
public void revert(int i){
state[i]='-';
DDState[RowCol.get(i)[0]][RowCol.get(i)[1]]='-';
if(turn=='X'){
turn = 'O';
countO--;
}
else {
turn = 'X';
countX--;
}
}
主要方法的外观如下:
public static void main(String[] args) {
State2 s=new State2(4);
int [] results=new int[2];
s.setX(0);
long startTime = System.currentTimeMillis();
results=s.newminimax499(Integer.MIN_VALUE,Integer.MAX_VALUE);
long endTime = System.currentTimeMillis();
System.out.println("Score: "+results[0]+" Position: "+ results[1]);
System.out.println("Run time: " + (endTime-startTime));
s.boardShow();
}
答案 0 :(得分:5)
我不相信这里有一个错误 - 如果O在一个较早的位置上玩,它会分叉,而如果它在中心播放,则会强制平局。据推测,4x4游戏更难赢/输,因此计算机无动于衷地选择了第一个开放式广场。
下面,1
表示O的强制响应,2
表示分叉移动X,?
表示可能的胜利位置。
X|O|
-+-+-
2|X|?
-+-+-
?| |1
X| |O
-+-+-
X|2|?
-+-+-
1| |?
X|2|?
-+-+-
O|X|
-+-+-
|?|1
答案 1 :(得分:-1)
我放弃了试图绕过代码 - 房间开始对我的同事太吵了。
然而 - 我确实把我自己的解决方案放在一起 - 我认为这可能值得发布。它的设计并不是特别有效 - 事实上我仍在等待它在4X4测试中首次采取行动,但它确实在3X3上做了正确的事情(即对抗自己并赢得对抗)一个愚蠢的策略)。
它应该能够处理不同大小的电路板。对角线是使用对角线台阶从板的一侧到达另一侧的任何方格线。
一旦完成它的4X4运行,我就会发布它的动作进行比较。
public class TicTacToe {
/**
* Strategies for players.
*/
interface Strategy {
/**
* Think up a place to move.
*
* @param board - The current board position.
* @return Where to move.
*/
Optional<Board.Square> think(Board board, Player player);
}
/**
* Stupid strategy just finds the first empty square and plays it.
*/
static final Strategy stupid = (Board board, Player player) -> {
List<Board.Square> empty = board.getEmpties();
if (empty.size() > 0) {
return Optional.of(empty.get(0));
}
return Optional.empty();
};
/**
* Minimax strategy.
*/
static final Strategy minimax = new Strategy() {
// Divide by this at each depth step.
private static final double FACTOR = 3;
// Memoize each discovered best move.
Map<Player, Map<String, Board.Place>> best = new HashMap<>();
{
// One map for each player.
for (Player p : Player.values()) {
best.put(p, new HashMap<>());
}
}
@Override
public Optional<Board.Square> think(Board board, Player player) {
Optional<Board.Square> win = Optional.empty();
// Seen this one before?
Board.Place seen = best.get(player).get(board.toString());
if (seen == null) {
double winValue = Double.NEGATIVE_INFINITY;
for (Board.Square play : board.getEmpties()) {
double value = value(board, player, play, 1);
if (value > winValue) {
win = Optional.of(play);
winValue = value;
}
}
if (win.isPresent()) {
// Record my result.
best.get(player).put(board.toString(), win.get().place);
} else {
System.out.println("No best found for " + board);
}
} else {
// Reuse it.
win = Optional.of(board.square(seen));
}
return win;
}
private double value(Board board, Player player, Board.Square move, double scale) {
// My value.
double value = 0;
// Make the move.
board.mark(player, move);
try {
// A win?
if (!board.win()) {
// Remaining empties.
List<Board.Square> empties = board.getEmpties();
if (!empties.isEmpty()) {
// Record it's value.
double[] values = new double[empties.size()];
for (int i = 0; i < values.length; i++) {
// Try each further move negating and downscaling at each level.
values[i] = value(board, player.next(), empties.get(i), (scale / FACTOR) * -1);
}
// Add them up.
value += DoubleStream.of(values).sum();
}
} else {
// Somebody already won.
value = scale;
}
} finally {
//System.out.println("Value of " + board + " to player " + player + " = " + value);
// Unroll the move.
board.unmark(player, move);
}
return value;
}
};
/**
* There are exactly two players.
*/
enum Player {
O, X;
/**
* Each player has a strategy.
*
* Defaults to stupid.
*/
private Strategy strategy = stupid;
/**
* What mark they make on the board.
*
* @return String representing the mark they make.
*/
public char mark() {
// The mark they make is the first character of their name.
return name().charAt(0);
}
/**
* Who is the next player.
*
* @return The other player.
*/
public Player next() {
return other(this);
}
/**
* Who is the other player.
*
* @return The other player.
*/
static Player other(Player it) {
return it == O ? X : O;
}
public Strategy getStrategy() {
return strategy;
}
public Player setStrategy(Strategy strategy) {
this.strategy = strategy;
return this;
}
}
/**
* The board.
*/
static class Board {
/**
* Top left corner is 0,0. Access is [y][x].
*/
final Square[][] board;
// The board as columns.
final List<List<Square>> columns;
// The dagonals.
final List<List<Square>> diagonals;
// For sense checking.
final int width;
final int height;
// Counts for each player - useful optimisation.
final int[] counts = new int[Player.values().length];
// A clear square.
private static final char Clear = ' ';
public Board(int width, int height) {
this.width = width;
this.height = height;
board = new Square[height][];
for (int y = 0; y < height; y++) {
board[y] = new Square[width];
// Fill it.
for (int x = 0; x < width; x++) {
board[y][x] = new Square(x, y);
}
}
// Build my columns lists.
columns = new ArrayList<>();
for (int x = 0; x < width; x++) {
List<Square> column = new ArrayList<>();
for (int y = 0; y < height; y++) {
column.add(board[y][x]);
}
columns.add(column);
}
// And the diagonals.
diagonals = new ArrayList<>();
for (int y = 0; y <= height - width; y++) {
List<Square> diagonal = new ArrayList<>();
// Left to right.
for (int x = 0; x < width; x++) {
diagonal.add(board[y + x][x]);
}
diagonals.add(diagonal);
// Right to left.
diagonal = new ArrayList<>();
for (int x = 0; x < width; x++) {
diagonal.add(board[y + x][width - 1 - x]);
}
diagonals.add(diagonal);
}
}
public Board(int size) {
this(size, size);
}
private Stream<List<Square>> asRows() {
// Map each row to a list.
return Arrays.stream(board).map(r -> Arrays.asList(r));
}
private Stream<List<Square>> asCols() {
// Walk the columns.
return columns.stream();
}
private Stream<List<Square>> asDiagonals() {
// Walk the diagonals.
return diagonals.stream();
}
private boolean winning(List<Square> row) {
//System.out.println("winner(" + row + ")");
if (row.isEmpty()) {
return false;
}
Optional<Player> owner = row.get(0).owner;
// First square owned.
if (!owner.isPresent()) {
return false;
}
return !row.stream()
//.peek(s -> System.out.println(s))
.anyMatch((s) -> (!s.owner.isPresent() || s.owner.get() != owner.get()));
}
public Square square(Place place) {
return board[place.y][place.x];
}
private Optional<Player> getWinner() {
Optional<Player> winner = Optional.empty();
// Must be at least enough plays by one player to reach across/down the board.
OptionalInt min = IntStream.of(counts).min();
if (!min.isPresent() || min.getAsInt() < Math.min(width, height)) {
// Nobody can posibly have won.
return winner;
}
List<List<Square>> winners = asRows()
.filter(r -> winning(r))
.collect(Collectors.toList());
if (winners.isEmpty()) {
winners = asCols()
.filter(r -> winning(r))
.collect(Collectors.toList());
}
if (winners.isEmpty()) {
winners = asDiagonals()
.filter(r -> winning(r))
.collect(Collectors.toList());
}
if (!winners.isEmpty()) {
winner = winners.get(0).get(0).owner;
}
return winner;
}
private boolean isDrawn() {
// All square occupied - counts add up to width*height.
return IntStream.of(counts).sum() == width * height;
}
private boolean win() {
return getWinner().isPresent();
}
/**
* One square of the board.
*
* Remember that a Square is ON a board - i.e. it is a sub-object of a Board. Do not attempt to memoize Squares as they will hold a reference to their parent board.
*/
class Square {
// The owner of it.
Optional<Player> owner = Optional.empty();
// where it is on the board.
final Place place;
public Square(Place place) {
this.place = place;
}
public Square(int x, int y) {
this(new Place(x, y));
}
public char getMark() {
return owner.isPresent() ? owner.get().mark() : Clear;
}
public boolean isTaken() {
return owner.isPresent();
}
@Override
public String toString() {
return place + "(" + (owner.isPresent() ? owner.get().toString() : "") + ")";
}
private void take(Player player) {
if (!isTaken()) {
owner = Optional.of(player);
// Keep track of the counts.
counts[player.ordinal()] += 1;
} else {
throw new IllegalStateException("Attempt to take taken square!"
+ " Square=" + this
+ " Player=" + player
+ " board=" + Board.this.toString()
);
}
}
private void release(Player player) {
if (isTaken()) {
if (owner.get() == player) {
owner = Optional.empty();
// Keep track of the counts.
counts[player.ordinal()] -= 1;
} else {
throw new IllegalStateException("Attempt to release other player's square!"
+ " Square=" + this
+ " Player=" + player
+ " board=" + Board.this.toString()
);
}
} else {
throw new IllegalStateException("Attempt to release untaken square!"
+ " Square=" + this
+ " Player=" + player
+ " board=" + Board.this.toString()
);
}
}
}
private List<Square> getEmpties() {
// Walk all squares.
return Arrays.stream(board)
// Unroll each row.
.flatMap(l -> Arrays.stream(l))
// All not taken.
.filter(s -> !s.isTaken())
// As a list.
.collect(Collectors.toList());
}
/**
* A place on the board.
*/
class Place {
final int x;
final int y;
public Place(int x, int y) {
// Sense check.
if (x < 0 || x >= width) {
throw new IllegalArgumentException("Off board: x = " + x);
}
if (y < 0 || y >= height) {
throw new IllegalArgumentException("Off board: x = " + x);
}
this.x = x;
this.y = y;
}
@Override
public String toString() {
return "{" + x + "," + y + '}';
}
}
/**
* Mark a place on the board.
*
* @param player
*/
public void mark(Player player, Square square) {
// Take the square.
square.take(player);
}
/**
* UnMark a place on the board.
*
* Confirms the mark first.
*/
public void unmark(Player player, Square square) {
square.release(player);
}
@Override
public String toString() {
return Arrays.stream(board)
.map(r -> Arrays.stream(r)
.map(s -> String.valueOf(s.getMark()))
.collect(Collectors.joining("", "[", "]")))
.collect(Collectors.joining(""));
}
}
class Game {
// The current board state.
final Board board;
public Game(Board board) {
this.board = board;
}
public Game(int size) {
this(new Board(size));
}
public Game(int width, int height) {
this(new Board(width, height));
}
private void play(Player firstPlayer) {
boolean gameFinished = false;
Player player = firstPlayer;
do {
Optional<Board.Square> move = player.getStrategy().think(board, player);
if (move.isPresent()) {
board.mark(player, move.get());
System.out.println("Moved: " + move.get() + " -> " + board);
} else {
System.out.println("No move!");
}
// Has anyone won?
Optional<Player> winner = board.getWinner();
if (winner.isPresent()) {
System.out.println("Player " + winner.get() + " won! " + board);
gameFinished = true;
} else {
if (board.isDrawn()) {
System.out.println("Draw! " + board);
gameFinished = true;
}
}
// Next player.
player = player.next();
} while (!gameFinished);
}
}
private void testStupid() {
// Both start off stupid.
new Game(3).play(Player.X);
}
private void testMinimax() {
// Test minimax strategy.
Board testBoard = new Board(3);
// Set it up like http://neverstopbuilding.com/minimax
testBoard.mark(Player.O, testBoard.board[0][0]);
testBoard.mark(Player.X, testBoard.board[0][2]);
testBoard.mark(Player.X, testBoard.board[1][0]);
testBoard.mark(Player.X, testBoard.board[2][0]);
testBoard.mark(Player.O, testBoard.board[2][1]);
testBoard.mark(Player.O, testBoard.board[2][2]);
// What does the strategy do?
Optional<Board.Square> play = minimax.think(testBoard, Player.X);
System.out.println("minimax(" + testBoard + ") = " + play);
}
private void test3X3Game() {
// Play a game.
// Change strategies.
new Game(3).play(Player.X);
}
private void test4X4Game() {
// Play a game.
new Game(4).play(Player.X);
}
public void test() {
testStupid();
testMinimax();
System.out.println("Test minmax vs stupid");
Player.X.setStrategy(minimax);
test3X3Game();
System.out.println("Test minmax vs minmax");
Player.O.setStrategy(minimax);
test3X3Game();
System.out.println("Test 4 X 4");
test4X4Game();
}
public static void main(String args[]) {
try {
new TicTacToe().test();
} catch (Throwable t) {
t.printStackTrace(System.err);
}
}
}
当对自己玩minimax时,它会移动:
Moved: {1,1}(X) -> [ ][ X ][ ]
Moved: {0,0}(O) -> [O ][ X ][ ]
Moved: {2,2}(X) -> [O ][ X ][ X]
Moved: {0,2}(O) -> [O ][ X ][O X]
Moved: {0,1}(X) -> [O ][XX ][O X]
Moved: {2,1}(O) -> [O ][XXO][O X]
Moved: {1,0}(X) -> [OX ][XXO][O X]
Moved: {1,2}(O) -> [OX ][XXO][OOX]
Moved: {2,0}(X) -> [OXX][XXO][OOX]
Draw! [OXX][XXO][OOX]
请注意,这个操作系统不 OP已发布,因此OP的算法显然存在问题。