我正在尝试使用keras(和tensorflow 2.0 alpha,关闭急切执行)来实现一个actor-critic网络,但是在更新actor网络权重的keras函数中似乎有一个错误。
我将我的print()语句留在代码中,以显示我已研究的内容,并希望弥补该代码不完整,因此无法再现的事实。 编辑:这是我的演员和评论家模型
该函数的调用方式如下,我将输出所有输入数组的形状:
# Networks optimization
print('Shapes of vars: states: {}, actions: {}, advantages: {}'.format(
np.array(states).shape, np.array(actions).shape, np.array(advantages).shape))
self.a_opt([states, actions, advantages]) # call the keras function written out below
# a print statement here is never reached
被调用的函数(输出“不兼容的形状”错误)如下所示:
def a_opt(self):
""" Actor Optimization: Advantages + Entropy term to encourage exploration
(Cf. https://arxiv.org/abs/1602.01783)
"""
modelout = K.print_tensor(
self.model.output, message="model output: " + str(K.int_shape(self.model.output)))
action_pl = K.print_tensor(
self.action_pl, message="action_pl: " + str(K.int_shape(self.action_pl)))
weighted_actions = K.sum(action_pl * modelout, axis=1)
weighted_actions = K.print_tensor(
weighted_actions, message="weighted_actions: ")
eligibility = K.log(weighted_actions + 1e-10) * \
K.stop_gradient(self.advantages_pl)
eligibility = K.print_tensor(eligibility, message="eligibility: ")
entropy = K.sum(modelout *
K.log(modelout + 1e-10), axis=1)
entropy = K.print_tensor(entropy, message="entropy: ")
loss = 0.001 * entropy - K.sum(eligibility)
loss = K.print_tensor(loss, message="loss: ")
updates = self.rms_optimizer.get_updates(loss=loss,
params=self.model.trainable_weights)
return K.function([self.model.input, self.action_pl, self.advantages_pl], [], updates=updates)
到目前为止很好,但是执行程序会产生以下控制台输出:
Shapes of vars: states: (999, 1, 44), actions: (999, 3), advantages: (999,)
action_pl: (None, 3)[[0.861626744 0.928109825 0.0259102583...]...]
model output: (None, 3)[[0.365334 0.333090335 0.301575601]]
Traceback (most recent call last):
File ".\actor_critic.py", line 85, in <module>
agent.train(marketSim, ac_args, summary_writer)
File "FILEPATH", line 115, in train
self.train_models(states, actions, rewards, done)
File "FILEPATH", line 76, in train_models
self.a_opt([states, actions, advantages])
File "C:\Python37\lib\site-packages\tensorflow\python\keras\backend.py", line 3096, in __call__
run_metadata=self.run_metadata)
File "C:\Python37\lib\site-packages\tensorflow\python\client\session.py", line 1440, in __call__
run_metadata_ptr)
File "C:\Python37\lib\site-packages\tensorflow\python\framework\errors_impl.py", line 548, in __exit__
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [1,44] vs. [1,3]
[[{{node gradients/mul_grad/BroadcastGradientArgs}}]]
[[Sum_1/_119]]
如您所见,批次大小为999,state
的形状为(1,44),actions
的形状为(3,)。从错误消息中,我猜想我会将这两个数乘以某个地方,但是我找不到发生的地方。
我也不明白为什么action_pl
和model_output
看起来(None, 3)
具有相同的形状,但是对于model_output
这显然是正确的,但打印出来的action_pl
张量似乎实际上可能具有不同的形状(可能是1,44?),这使我完全困惑,因为我传递给函数的列表actions
的形状肯定为(999,3)
我也不确定哪一行会真正产生错误:根据print_tensor
行,我猜想weighted_actions = K.sum...
行是因为在它下面(或更深的计算图)没有输出任何东西,但这可能是错误的。
tl; dr:a_opt()
的哪一行实际上会产生错误,错误的形状[1,44]
是从哪里来的,还有没有更好的方法来调试像这样的计算图?