我正在尝试使用minimax终端创建一个AI代理来玩跳棋。但是,它不会移动正确的片段。即使它们无法移动,它似乎只是随机移动。
我已经多次重写了minimax和undo函数,因为我相信问题是因为状态每次都无法正确撤消,但是我仍然遇到相同的问题。
Picturebox.Image = New Bitmap("Image Path");
这里是怎么称呼它。因为任何一块都可以移动,所以它会循环通过板上的所有位置,如果是白色块(w),则会使用板上的位置调用minimax。从minimax给出的解决方案中,它检查是否是有效的举动,如果是并且具有最佳效用得分,则选择该举动。两个循环结束后,将播放最佳动作
def undo(self, state, oldRow, oldCol, newRow, newCol):
if oldRow + 1 == newRow:
if state[oldRow][oldCol] == 'b' and state[newRow][newCol] == 'B':
temp = state[oldRow][oldCol]
state[oldRow][oldCol] = 'b'
state[newRow][newCol] = temp
else:
temp = state[oldRow][oldCol]
state[oldRow][oldCol] = state[newRow][newCol]
state[newRow][newCol] = temp
elif oldRow - 1 == newRow:
if state[oldRow][newRow] == 'w' and state[newRow][newCol] == 'W':
temp = state[oldRow][oldCol]
state[oldRow][oldCol] = 'w'
state[newRow][newCol] = temp
else:
temp = state[oldRow][oldCol]
state[oldRow][oldCol] = state[newRow][newCol]
state[newRow][newCol] = temp
elif oldRow + 2 == newRow:
if state[oldRow][oldCol] == 'b' and state[newRow][newCol] == 'B':
temp = state[oldRow][oldCol]
state[oldRow][oldCol] = 'b'
state[newRow][newCol] = temp
state[oldRow + 1][int((oldCol + newCol) / 2)] = 'w'
else:
if state[newRow][newCol] == 'b' or state[newRow][newCol] == 'B':
temp = state[oldRow][oldCol]
state[oldRow][oldCol] = state[newRow][newCol]
state[newRow][newCol] = temp
state[oldRow + 1][int((oldCol + newCol) / 2)] = 'w'
elif state[newRow][newCol] == 'W':
temp = state[oldRow][oldCol]
state[oldRow][oldCol] = state[newRow][newCol]
state[newRow][newCol] = temp
state[oldRow + 1][int((oldCol + newCol) / 2)] = 'b'
elif oldRow - 2 == newRow:
if state[oldRow][oldCol] == 'w' and state[newRow][newCol] == 'W':
temp = state[oldRow][oldCol]
state[oldRow][oldCol] = 'w'
state[newRow][newCol] = temp
state[oldRow - 1][int((oldCol + newCol) / 2)] = 'b'
else:
if state[newRow][newCol] == 'w' or state[newRow][newCol] == 'W':
temp = state[oldRow][oldCol]
state[oldRow][oldCol] = state[newRow][newCol]
state[newRow][newCol] = temp
state[oldRow - 1][int((oldCol + newCol) / 2)] = 'b'
elif state[newRow][newCol] == 'B':
temp = state[oldRow][oldCol]
state[oldRow][oldCol] = state[newRow][newCol]
state[newRow][newCol] = temp
state[oldRow + 1][int((oldCol + newCol) / 2)] = 'w'
return state
def minimaxAB(self, state, row, col, player, depth, alpha, beta):
if depth == 0 or self.terminal_test(state):
return None, self.utility(state)
if player == HUMAN_PLAYER: # maximizing player
best = -math.inf
bestRow = None
bestCol = None
for move in self.actions(state, row, col, player):
newRow = move[0]
newCol = move[1]
_, val = self.minimaxAB(state, newRow, newCol, self.getEnemyPlayer(HUMAN_PLAYER), depth - 1, alpha, beta)
# undo the move
state = self.undo(state, row, col, newRow, newCol)
if val > best:
bestRow, bestCol, best = newRow, newCol, val
alpha = max(alpha, val)
if alpha >= beta:
break
next = bestRow, bestCol
return next, best
else: # minimizing player
best = math.inf
bestRow = None
bestCol = None
for move in self.actions(state, row, col, player):
newRow = move[0]
newCol = move[1]
_, val = self.minimaxAB(state, newRow, newCol, self.getEnemyPlayer(AI_PLAYER), depth - 1, alpha, beta)
# undo the move
state = self.undo(state, row, col, newRow, newCol)
if val < best:
bestRow, bestCol, best = newRow, newCol, val
beta = min(beta, val)
if alpha >= beta:
break
next = bestRow, bestCol
return next, best
它应该做出明智而有效的举动,但同时进行两次无效举动