我正在尝试实施monte carlo树搜索最终的tictactoe游戏(它像tictactoe但是棋盘更大)并且在alogirthm的选择阶段遇到麻烦。我的搜索树由各个节点组成,算法的第一部分试图选择其中一个进行扩展。但是,我继续收到已达到最大递归的错误:
def select_node(node, board, state, identity):
""" Traverses the tree until the end criterion are met.
Args:
node: A tree node from which the search is traversing.
board: The game setup.
state: The state of the game.
identity: The player's identity, either 1 or 2.
Returns: A node from which the next stage of the search can proceed.
"""
if len(node.child_nodes) == 0 and len(node.untried_actions) > 0: # Node is root, choose it
return node
best_child = None
if board.current_player(state) == identity: # its our turn
best_child_utc = -inf
for move in node.child_nodes:
child_node = node.child_nodes[move]
if child_node.visits == 0:
best_child = child_node
break
child_UCB_score = (child_node.wins/child_node.visits) + 1.41 * sqrt(log(node.visits)/child_node.visits)
if child_UCB_score > best_child_utc:
best_child = child_node
best_child_utc = child_UCB_score
else: # its the opponents turn
best_child_utc = -inf
for move in node.child_nodes:
child_node = node.child_nodes[move]
if child_node.visits == 0:
best_child = child_node
break
child_UCB_score = 1 - ((child_node.wins/child_node.visits) + 1.41 * sqrt(log(node.visits)/child_node.visits))
if child_UCB_score > best_child_utc:
best_child = child_node
best_child_utc = child_UCB_score
if best_child is not None and len(best_child.child_nodes) > 0 and not board.is_ended(best_child.state):
return select_node(best_child, board, best_child.state, identity)
else:
return best_child
有没有办法可以将我的算法从递归转换为使用某种循环,这样它就不会达到递归限制?这是在9x9板上进行的,因此理论上可以达到的最大递归水平为81.