我正在尝试修改DDPG算法,以控制手臂的肌肉骨骼模型。标准模型运行良好,但我希望手目标位置早于肌肉长度反馈进入网络。这是我的网络图表,看起来可以编译。
As you can see, I use lambda layers to split out the input to the network into 47 muscle inputs (left branch) and 3 hand target coordinates (right branch). I am able to run data through this model feedforward.
当我尝试使用参与者和评论者来编译我的DDPG算法以计算策略梯度时,就会出现问题。 The graph of the critic can be seen here
这是执行此操作的代码
# Combine actor and critic so that we can get the policy gradient.
# Assuming critic's state inputs are the same as actor's.
combined_inputs = []
critic_inputs = []
for i in self.critic.input:
if i == self.critic_action_input:
combined_inputs.append([])
else:
combined_inputs.append(i)
critic_inputs.append(i)
combined_inputs[self.critic_action_input_idx] = self.actor(critic_inputs)
错误发生在最后一行,
ValueError: Dimensions must be equal, but are 51 and 47 for 'model_2/dense_9/MatMul' (op: 'MatMul') with input shapes: [0,51], [47,31].
从我的图表来看,这对我来说没有意义。输入空间为(None, 51)
,输入大小为
critic_inputs
[<tf.Tensor 'observation_input_4:0' shape=(?, 1, 51) dtype=float32>]
这适用于我的简单模型。
任何建议将不胜感激。