Bi rnn不收敛。
我发现的问题是,只有当同一批次中的标签全部相同时,损失才会减少。 (全1或0)
我已尝试将网络适合一个数据示例,进行得很好。 然后,我继续使网络适应一批中的更多数据,但不会收敛。
已经检查了嵌入是否正确,甚至没有birnn本身。 有没有人可以帮助我? 预先谢谢。
代码(这是我的模型对象内部的函数)
使用tf.device('/ GPU:'+ str(self.gpu_idx)): 与tf.variable_scope('input'):
context_ph = tf.placeholder(tf.int32, shape = [None, max_sentence_length], name = 'context_ph')
labels_ph = tf.placeholder(tf.float64, shape = [None,1], name = 'labels_ph')
context_length_ph = tf.placeholder(tf.int32 , shape = [None,], name = 'context_length')
dropout_keep_prob_ph = tf.placeholder(tf.float64, name = 'dropout_keep_prob_ph')
self.context_ph = context_ph
self.dropout_keep_prob_ph = dropout_keep_prob_ph
self.labels_ph = labels_ph
self.context_length_ph = context_length_ph
with tf.variable_scope('operate_idx2embedding'):
embedding_table_ph = tf.get_variable("embeddings", shape=[vocab_size, embedding_dim],trainable=False,dtype = "float64")
context_embed_ph = tf.nn.embedding_lookup(embedding_table_ph, context_ph)
self.context_embed_ph = context_embed_ph
with tf.variable_scope('BiGRU'):
# bi #
gruCell_fw = tf.nn.rnn_cell.GRUCell(gru_dim)
gruCell_bw = tf.nn.rnn_cell.GRUCell(gru_dim)
#init_state = gruCell.zero_state(None, dtype=tf.float32)
outputs, _ = tf.nn.bidirectional_dynamic_rnn(gruCell_fw,gruCell_bw, context_embed_ph, \
time_major=False,dtype = "float64",sequence_length = self.context_length_ph)
with tf.variable_scope('Attention_layer'):
attention_output, alphas = attention(outputs, 250, return_alphas=True)
drop = tf.nn.dropout(attention_output, self.dropout_keep_prob_ph)
with tf.variable_scope('Affine_layer'):
W = tf.Variable(tf.truncated_normal([gru_dim*2, 1], stddev=0.1,dtype = "float64"),dtype = "float64") # Hidden size is multiplied by 2 for Bi-RNN
b = tf.Variable(tf.constant(0., shape=[1],dtype = "float64"),dtype = "float64")
y_hat = tf.nn.xw_plus_b(drop, W, b)
y_hat = tf.squeeze(y_hat)
with tf.variable_scope('loss'):
loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=y_hat, labels=self.labels_ph))
self.accuracy = tf.reduce_mean(tf.cast(tf.equal(tf.round(tf.sigmoid(y_hat)), self.labels_ph), tf.float32))
self.loss = loss
在本节之外,我将使用:
optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate, name = 'optimizer').minimize(self.loss)
以最大程度地减少损失。 我觉得这可能是我犯了一些愚蠢的错误,但我无法弄清。