我使用Tensorflow构建了一个Deep-Q网络。当我尝试创建其中两个(我想让网络与其自身竞争)时,我得到:
ValueError:尝试共享变量密集/内核,但指定形状 (100,160)并找到形状(9,100)。
这是我的网络:
class QNetwork:
"""
A Q-Network implementation
"""
def __init__(self, input_size, output_size, hidden_layers_size, gamma, maximize_entropy, reuse):
self.q_target = tf.placeholder(shape=(None, output_size), dtype=tf.float32)
self.r = tf.placeholder(shape=None, dtype=tf.float32)
self.states = tf.placeholder(shape=(None, input_size), dtype=tf.float32)
self.enumerated_actions = tf.placeholder(shape=(None, 2), dtype=tf.int32)
self.learning_rate = tf.placeholder(shape=[], dtype=tf.float32)
layer = self.states
for l in hidden_layers_size:
layer = tf.layers.dense(inputs=layer, units=l, activation=tf.nn.relu,
kernel_initializer=tf.contrib.layers.xavier_initializer(),
reuse=reuse)
self.output = tf.layers.dense(inputs=layer, units=output_size,
kernel_initializer=tf.contrib.layers.xavier_initializer(),
reuse=reuse)
self.predictions = tf.gather_nd(self.output, indices=self.enumerated_actions)
if maximize_entropy:
self.future_q = tf.log(tf.reduce_sum(tf.exp(self.q_target), axis=1))
else:
self.future_q = tf.reduce_max(self.q_target, axis=1)
self.labels = self.r + (gamma * self.future_q)
self.cost = tf.reduce_mean(tf.losses.mean_squared_error(labels=self.labels, predictions=self.predictions))
self.optimizer = tf.train.AdamOptimizer(learning_rate=self.learning_rate).minimize(self.cost)
此代码失败:
q1 = QNetwork(9, 9, [100, 160, 160, 100], gamma=0.99, maximize_entropy=False, reuse=tf.AUTO_REUSE)
q2 = QNetwork(9, 9, [100, 160, 160, 100], gamma=0.99, maximize_entropy=False, reuse=tf.AUTO_REUSE)
任何想法如何解决这个问题? (运行TF 1.10.1,Python 3.6.5)
答案 0 :(得分:0)
已解决。
我需要:
variable_scope
将所有内容放入reuse=tf.AUTO_REUSE
(用于Adam优化器)