蒙特卡罗选择阶段超出最大递归

时间:2017-10-24 04:53:00

标签: python recursion tic-tac-toe montecarlo

我正在尝试实施monte carlo树搜索最终的tictactoe游戏(它像tictactoe但是棋盘更大)并且在alogirthm的选择阶段遇到麻烦。我的搜索树由各个节点组成,算法的第一部分试图选择其中一个进行扩展。但是,我继续收到已达到最大递归的错误:

def select_node(node, board, state, identity):
    """ Traverses the tree until the end criterion are met.

    Args:
        node:       A tree node from which the search is traversing.
        board:      The game setup.
        state:      The state of the game.
        identity:   The player's identity, either 1 or 2.

    Returns:        A node from which the next stage of the search can proceed.

    """

    if len(node.child_nodes) == 0 and len(node.untried_actions) > 0:    # Node is root, choose it
        return node

    best_child = None
    if board.current_player(state) == identity:    # its our turn 
        best_child_utc = -inf
        for move in node.child_nodes:
            child_node = node.child_nodes[move]
            if child_node.visits == 0:
                best_child = child_node
                break
            child_UCB_score = (child_node.wins/child_node.visits) + 1.41 * sqrt(log(node.visits)/child_node.visits)
            if child_UCB_score > best_child_utc:
                best_child = child_node
                best_child_utc = child_UCB_score
    else:    # its the opponents turn
        best_child_utc = -inf
        for move in node.child_nodes:
            child_node = node.child_nodes[move]
            if child_node.visits == 0:
                best_child = child_node
                break
            child_UCB_score = 1 - ((child_node.wins/child_node.visits) + 1.41 * sqrt(log(node.visits)/child_node.visits))
            if child_UCB_score > best_child_utc:
                best_child = child_node
                best_child_utc = child_UCB_score


    if best_child is not None and len(best_child.child_nodes) > 0 and not board.is_ended(best_child.state):
        return select_node(best_child, board, best_child.state, identity)
    else:
        return best_child

有没有办法可以将我的算法从递归转换为使用某种循环,这样它就不会达到递归限制?这是在9x9板上进行的,因此理论上可以达到的最大递归水平为81.

0 个答案:

没有答案