Question

我尝试创建DQN神经网络。我有卷积神经网络。输入具有形状：无x WIDTH x HEIGHT x FRAME_COUNT。计算FULLY_CONNECTED_SIZE常量，使得输出具有形状[3]，用于形状为1 x WIDTH x HEIGHT x FRAME_COUNT的输入。

FULLY_CONNECTED_SIZE = (WIDTH / 8) * (HEIGHT / 8) * 32

   def createNetwork(self):
            conv_layer_1_biases = tf.Variable(tf.constant(np.random.uniform(-1,1), shape=[16]))
            conv_layer_1_weights = tf.Variable(tf.constant(np.random.uniform(-1,1), shape=[8,8,FRAME_COUNT,16]))
            input_layer = tf.placeholder("float", [None,WIDTH,HEIGHT,FRAME_COUNT])
            conv_layer_1 = tf.nn.relu(tf.nn.conv2d(input_layer, strides=[1,4,4,1], filter=conv_layer_1_weights, padding = 'SAME')  + conv_layer_1_biases)    
            conv_layer_2_biases = tf.Variable(tf.constant(np.random.uniform(-1,1), shape=[32]))
            conv_layer_2_weights = tf.Variable(tf.constant(np.random.uniform(-1,1), shape=[4, 4, 16, 32]))
            conv_layer_2 = tf.nn.relu(tf.nn.conv2d(conv_layer_1, strides=[1,2,2,1],filter=conv_layer_2_weights, padding = 'SAME') + conv_layer_2_biases)

            reshaped_layer = tf.reshape(conv_layer_2,[-1,FULLY_CONNECTED_SIZE])

            fully_connected_layer_weights = tf.Variable(tf.constant(np.random.uniform(-1,1), shape=[FULLY_CONNECTED_SIZE,256]))
            fully_connected_layer_biases = tf.Variable(tf.constant(np.random.uniform(-1,1), shape=[256]))

            fully_connected_layer = tf.nn.relu(tf.matmul(reshaped_layer,fully_connected_layer_weights) + fully_connected_layer_biases)

            output_layer_weights = tf.Variable(tf.constant(np.random.uniform(-1,1), shape=[256,NUMBER_OF_ACTIONS]))
            output_layer_biases = tf.Variable(tf.constant(np.random.uniform(-1,1), shape=[NUMBER_OF_ACTIONS]))

            output_layer = tf.matmul(fully_connected_layer,output_layer_weights) + output_layer_biases

            return input_layer, output_layer

我像这样训练：

            self.inputQ, self.outputQ = self.createNetwork()
            self._session = tf.Session()
            self._action = tf.placeholder("float", [None, NUMBER_OF_ACTIONS])
            self._target = tf.placeholder("float", [None])
            readout_action = tf.reduce_sum(tf.mul(self.outputQ, self._action), reduction_indices=1)

            cost = tf.reduce_mean(tf.square(self._target - readout_action))
            self._train_operation = tf.train.GradientDescentOptimizer(1e-4).minimize(cost)
            self._session.run(tf.initialize_all_variables())
            ... 
            self._session.run(self._train_operation,feed_dict={self._target:targets,self._action:actions,self.inputQ:before_states})

Before_states将FRAME_COUNT个图像的N阵列表示为大小为WIDTH x HEIGHT的数组，其中每个元素1或0：1表示白色像素和0 - 黑色像素，因此总形状为NxWIDTHxHEIGHTxFRAME_COUNT 我也有Q功能：

def Q(self, states):
        return self._session.run(self.outputQ, feed_dict={self.inputQ: states})

我的问题：

首次Q([state])每个state都有所不同，其中状态为FRAME_COUNT个图片，大小为WIDTH x HEIGHT，因此输入1xWIDTHxHEIGHTxFRAME_COUNT的神经网络按预期工作。

首次培训后，每个可能Q([state]) = Q1

的值state相同

第二次培训后，每个可能Q([state]) = Q2的值state相同。

在第n次培训后，每个可能Q([state]) = Qn的值state相同。

为什么会这样？对于每个输入状态，神经网络的输出应该是不同的。在那种情况下我该怎么做？我尝试了不同的学习率，优化方法（Descendent Gradient，Adam），初始权重。

训练后每个输入的神经网络输出相同

0 个答案: