Question

Bi rnn不收敛。

我发现的问题是，只有当同一批次中的标签全部相同时，损失才会减少。（全1或0）

我已尝试将网络适合一个数据示例，进行得很好。然后，我继续使网络适应一批中的更多数据，但不会收敛。

已经检查了嵌入是否正确，甚至没有birnn本身。有没有人可以帮助我？预先谢谢。

代码（这是我的模型对象内部的函数）

使用tf.device（'/ GPU：'+ str（self.gpu_idx））：与tf.variable_scope（'input'）：

            context_ph = tf.placeholder(tf.int32, shape = [None, max_sentence_length], name = 'context_ph')
            labels_ph = tf.placeholder(tf.float64, shape = [None,1], name = 'labels_ph')
            context_length_ph = tf.placeholder(tf.int32 , shape = [None,], name = 'context_length')
            dropout_keep_prob_ph = tf.placeholder(tf.float64, name = 'dropout_keep_prob_ph')

            self.context_ph = context_ph

            self.dropout_keep_prob_ph = dropout_keep_prob_ph
            self.labels_ph = labels_ph
            self.context_length_ph = context_length_ph 




        with tf.variable_scope('operate_idx2embedding'):


            embedding_table_ph = tf.get_variable("embeddings", shape=[vocab_size, embedding_dim],trainable=False,dtype = "float64")

            context_embed_ph = tf.nn.embedding_lookup(embedding_table_ph, context_ph)
            self.context_embed_ph = context_embed_ph


        with tf.variable_scope('BiGRU'):

            # bi #

            gruCell_fw = tf.nn.rnn_cell.GRUCell(gru_dim)
            gruCell_bw = tf.nn.rnn_cell.GRUCell(gru_dim)
            #init_state = gruCell.zero_state(None, dtype=tf.float32)
            outputs, _ = tf.nn.bidirectional_dynamic_rnn(gruCell_fw,gruCell_bw, context_embed_ph, \
                            time_major=False,dtype = "float64",sequence_length = self.context_length_ph)






        with tf.variable_scope('Attention_layer'):
            attention_output, alphas = attention(outputs, 250, return_alphas=True)
            drop = tf.nn.dropout(attention_output, self.dropout_keep_prob_ph)

        with tf.variable_scope('Affine_layer'):
            W = tf.Variable(tf.truncated_normal([gru_dim*2, 1], stddev=0.1,dtype = "float64"),dtype = "float64")  # Hidden size is multiplied by 2 for Bi-RNN
            b = tf.Variable(tf.constant(0., shape=[1],dtype = "float64"),dtype = "float64")
            y_hat = tf.nn.xw_plus_b(drop, W, b)
            y_hat = tf.squeeze(y_hat)

        with tf.variable_scope('loss'):

            loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=y_hat, labels=self.labels_ph))
            self.accuracy = tf.reduce_mean(tf.cast(tf.equal(tf.round(tf.sigmoid(y_hat)), self.labels_ph), tf.float32))
            self.loss = loss

在本节之外，我将使用：

optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate, name = 'optimizer').minimize(self.loss)

以最大程度地减少损失。我觉得这可能是我犯了一些愚蠢的错误，但我无法弄清。

输入不同的标签时，网络无法聚合

0 个答案: