Question

我试图为二十一点创建一个Q学习模型，但我不知道我怎么可能有两个以上的动作。我实际上可以采取行动：

def get_action(self, state):
    rewards = self._model.predict([np.array([state])], batch_size=1)
    print(state)
    print(rewards)
    quit()
    if np.random.uniform(0,1) < self._epsilon or self._learning == False:
        if rewards[0][0] > rewards[0][1]:
            action = Constants.hit
        else:
            action = Constants.stay
    else:
        action = np.random.choice([Constants.hit, Constants.stay])

    self._last_state = state
    self._last_action = action
    self._last_target = rewards

有关模型的更新：

def update(self,new_state,reward):
    if self._learning:
        rewards = self._model.predict([np.array([new_state])], batch_size=1)
        maxQ = rewards[0][0] if rewards[0][0] > rewards[0][1] else rewards[0][1]
        new = self._discount * maxQ

        if self._last_action == Constants.hit:
            self._last_target[0][0] = reward+new
        else:
            self._last_target[0][1] = reward+new

        # Update model
        self._model.fit(np.array([self._last_state]), self._last_target, batch_size=1, nb_epoch=1, verbose=0)

以及模型的初始化：

    model = Sequential()

    model.add(Dense(2, init='lecun_uniform', input_shape=(2,)))
    model.add(Activation('relu'))

    model.add(Dense(10, init='lecun_uniform'))
    model.add(Activation('relu'))

    model.add(Dense(4, init='lecun_uniform'))
    model.add(Activation('linear'))

    rms = RMSprop()
    model.compile(loss='mse', optimizer=rms)

实际上，我有可能受到打击或站立，我希望实现双打，如何做到这一点？在此先感谢您，对不起我的英语不好。

通过q学习获得2种动作

0 个答案: