获取:
assert q_values.shape == (len(state_batch), self.nb_actions)
AssertionError
q_values.shape <class 'tuple'>: (1, 1, 10)
(len(state_batch), self.nb_actions) <class 'tuple'>: (1, 10)
来自sarsa代理的keras-rl库:
rl.agents.sarsa.SARSAAgent#compute_batch_q_values
batch = self.process_state_batch(state_batch)
q_values = self.model.predict_on_batch(batch)
assert q_values.shape == (len(state_batch), self.nb_actions)
这是我的代码:
class MyEnv(Env):
def __init__(self):
self._reset()
def _reset(self) -> None:
self.i = 0
def _get_obs(self) -> List[float]:
return [1] * 20
def reset(self) -> List[float]:
self._reset()
return self._get_obs()
model = Sequential()
model.add(Dense(units=20, activation='relu', input_shape=(1, 20)))
model.add(Dense(units=10, activation='softmax'))
logger.info(model.summary())
policy = BoltzmannQPolicy()
agent = SARSAAgent(model=model, nb_actions=10, policy=policy)
optimizer = Adam(lr=1e-3)
agent.compile(optimizer, metrics=['mae'])
env = MyEnv()
agent.fit(env, 1, verbose=2, visualize=True)
想知道是否有人可以向我解释如何设置尺寸以及如何与库一起使用?我要输入20个输入,并希望输出10个。
答案 0 :(得分:2)
此特定错误是由输入形状为(1,20)引起的。如果您使用输入形状(20,),则错误将消失。
换句话说,SARSAAgent
需要一个模型,该模型输出具有二维(batch_size,nb_actions)的张量。并且您的模型输出的形状为(batch_size,1,10)。您可以在模型的输入中减小尺寸,也可以在输出中展平。
答案 1 :(得分:1)
首先让我们构建一个简单的玩具环境
[1,1,0,1,1,0,1,1,0]
0
:移至下一个迷宫区,1
:跳至下一个迷宫区,即跳过下一个迷宫并移至下一个迷宫区旁边的人要在体育馆中实施我们的环境,我们需要实施2种方法
class FooEnv(gym.Env):
def __init__(self):
self.maze = [1,1,0,1,1,0,1,1,0]
self.curr_state = 0
self.action_space = spaces.Discrete(2)
self.observation_space = spaces.Discrete(1)
def step(self, action):
if action == 0:
self.curr_state += 1
if action == 1:
self.curr_state += 2
if self.curr_state >= len(self.maze):
reward = 0.
done = True
else:
if self.maze[self.curr_state] == 0:
reward = 0.
done = True
else:
reward = 1.
done = False
return np.array(self.curr_state), reward, done, {}
def reset(self):
self.curr_state = 0
return np.array(self.curr_state)
现在给定当前状态,我们希望NN预测要采取的行动。
0
或`1 model = Sequential()
model.add(Dense(units=16, activation='relu', input_shape=(1,)))
model.add(Dense(units=8, activation='relu'))
model.add(Dense(units=2, activation='softmax'))
policy = BoltzmannQPolicy()
agent = SARSAAgent(model=model, nb_actions=2, policy=policy)
optimizer = Adam(lr=1e-3)
agent.compile(optimizer, metrics=['acc'])
env = FooEnv()
agent.fit(env, 10000, verbose=1, visualize=False)
# Test the trained agent using
# agent.test(env, nb_episodes=5, visualize=False)
输出
Training for 10000 steps ...
Interval 1 (0 steps performed)
10000/10000 [==============================] - 54s 5ms/step - reward: 0.6128
done, took 53.519 seconds
如果您的环境是网格(2D),则假设大小为n X m
,则NN的输入大小将为(n,m)
,如下所示,并在传递到密集层之前将其展平
model.add(Flatten(input_shape=(n,m))
中查看此示例