我用张量流构建了一个神经网络,这里的代码是:
class DQNetwork:
def __init__(self, state_size, action_size, learning_rate, name='DQNetwork'):
self.state_size = state_size
self.action_size = action_size
self.learning_rate = learning_rate
with tf.variable_scope(name):
# We create the placeholders
self.inputs_ = tf.placeholder(tf.float32, shape=[state_size[1], state_size[0]], name="inputs")
self.actions_ = tf.placeholder(tf.float32, [None, self.action_size], name="actions_")
# Remember that target_Q is the R(s,a) + ymax Qhat(s', a')
self.target_Q = tf.placeholder(tf.float32, [None], name="target")
self.fc = tf.layers.dense(inputs = self.inputs_,
units = 50,
kernel_initializer=tf.contrib.layers.xavier_initializer(),
activation = tf.nn.elu)
self.output = tf.layers.dense(inputs = self.fc,
units = self.action_size,
kernel_initializer=tf.contrib.layers.xavier_initializer(),
activation=None)
# Q is our predicted Q value.
self.Q = tf.reduce_sum(tf.multiply(self.output, self.actions_))
# The loss is the difference between our predicted Q_values and the Q_target
# Sum(Qtarget - Q)^2
self.loss = tf.reduce_mean(tf.square(self.target_Q - self.Q))
self.optimizer = tf.train.AdamOptimizer(self.learning_rate).minimize(self.loss)
但是我的输出有问题,
输出通常应与“ action_size”具有相同的大小,并且action_size值为3 但是我得到的输出像[[5] [3]]而不是[[3]],我真的不明白为什么...
此网络有2个密集层,一个具有50个感知器,另一个具有3个感知器(= action_size)。
state_size的格式为[[9] [5]]
如果有人知道为什么我的输出是二维的,我将非常感激
答案 0 :(得分:0)
您的self.inputs_
占位符的形状为(5, 9)
。您在形状为matmul(self.inputs_, fc1.w)
的密集层fc1
中执行(9, 50
操作,结果为形状(5, 50)
。然后,您应用形状为(50, 3)
的另一个密集层,从而得到输出形状为(5, 3)
。
示意相同:
matmul(shape(5, 9), shape(9, 50)) ---> shape(5, 50)
# output of 1st dense layer
matmul(shape(5, 50), shape(50, 3)) ---> shape(5, 3)
# output of 2nd dense layer
通常,输入占位符的第一维表示批处理大小,第二维是输入要素向量的维。因此,对于批次中的每个样本(您的批次大小为5),您将获得输出形状3。
要获取概率,请使用以下方法:
import tensorflow as tf
import numpy as np
inputs_ = tf.placeholder(tf.float32, shape=(None, 9))
actions_ = tf.placeholder(tf.float32, shape=(None, 3))
fc = tf.layers.dense(inputs=inputs_, units=2)
output = tf.layers.dense(inputs=fc, units=3)
reduced = tf.reduce_mean(output, axis=0)
probs = tf.nn.softmax(reduced) # <--probabilities
inputs_vals = np.ones((5, 9))
actions_vals = np.ones((1, 3))
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
print(probs.eval({inputs_:inputs_vals,
actions_:actions_vals}))
# [0.01858923 0.01566187 0.9657489 ]