Question

我用张量流构建了一个神经网络，这里的代码是：

class DQNetwork:
    def __init__(self, state_size, action_size, learning_rate, name='DQNetwork'):
        self.state_size = state_size
        self.action_size = action_size
        self.learning_rate = learning_rate

        with tf.variable_scope(name):
            # We create the placeholders

            self.inputs_ = tf.placeholder(tf.float32, shape=[state_size[1], state_size[0]], name="inputs")
            self.actions_ = tf.placeholder(tf.float32, [None, self.action_size], name="actions_")

            # Remember that target_Q is the R(s,a) + ymax Qhat(s', a')
            self.target_Q = tf.placeholder(tf.float32, [None], name="target")


            self.fc = tf.layers.dense(inputs = self.inputs_,
                                      units = 50,
                                      kernel_initializer=tf.contrib.layers.xavier_initializer(),
                                      activation = tf.nn.elu)


            self.output = tf.layers.dense(inputs = self.fc, 
                                        units = self.action_size,
                                        kernel_initializer=tf.contrib.layers.xavier_initializer(),
                                        activation=None)



            # Q is our predicted Q value.
            self.Q = tf.reduce_sum(tf.multiply(self.output, self.actions_))

            # The loss is the difference between our predicted Q_values and the Q_target
            # Sum(Qtarget - Q)^2
            self.loss = tf.reduce_mean(tf.square(self.target_Q - self.Q))

            self.optimizer = tf.train.AdamOptimizer(self.learning_rate).minimize(self.loss)

但是我的输出有问题，

输出通常应与“ action_size”具有相同的大小，并且action_size值为3 但是我得到的输出像[[5] [3]]而不是[[3]]，我真的不明白为什么...

此网络有2个密集层，一个具有50个感知器，另一个具有3个感知器（= action_size）。

state_size的格式为[[9] [5]]

如果有人知道为什么我的输出是二维的，我将非常感激

Answer 1

您的self.inputs_占位符的形状为(5, 9)。您在形状为matmul(self.inputs_, fc1.w)的密集层fc1中执行(9, 50操作，结果为形状(5, 50)。然后，您应用形状为(50, 3)的另一个密集层，从而得到输出形状为(5, 3)。

示意相同：

matmul(shape(5, 9), shape(9, 50)) ---> shape(5, 50) # output of 1st dense layer
matmul(shape(5, 50), shape(50, 3)) ---> shape(5, 3) # output of 2nd dense layer

通常，输入占位符的第一维表示批处理大小，第二维是输入要素向量的维。因此，对于批次中的每个样本（您的批次大小为5），您将获得输出形状3。

要获取概率，请使用以下方法：

import tensorflow as tf
import numpy as np

inputs_ = tf.placeholder(tf.float32, shape=(None, 9))
actions_ = tf.placeholder(tf.float32, shape=(None, 3))

fc = tf.layers.dense(inputs=inputs_, units=2)
output = tf.layers.dense(inputs=fc, units=3)
reduced = tf.reduce_mean(output, axis=0)
probs = tf.nn.softmax(reduced) # <--probabilities

inputs_vals = np.ones((5, 9))
actions_vals = np.ones((1, 3))

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())

    print(probs.eval({inputs_:inputs_vals,
                      actions_:actions_vals}))
    # [0.01858923 0.01566187 0.9657489 ]

神经网络输出问题

1 个答案: