我正在尝试使用简单的多层构建Tensorflow示例 具有一个隐藏层的perceptron(MLP)功能。但是,当我测试它并与其他软件进行比较时,例如Kaldi nnet1,训练期间的收敛效率不高,或者无法与Kaldi nnet1相媲美。我尽力使所有参数相同(输入,int目标,批量大小,学习率等),然而,仍然困惑在哪里可能是原因。一些代码如下:
Initialization:
self.weight = [tf.Variable(tf.truncated_normal([440, 8192],stddev=0.1))]
self.bias = [tf.Variable(tf.constant(0.01, shape=8192))]
self.weight.append( tf.Variable(tf.truncated_normal([8192, 8],stddev=0.1)) )
self.bias.append( tf.Variable(tf.constant(0.01, shape=8)) )
self.act = [tf.nn.sigmoid( tf.matmul(self.input, self.weight[0]) + self.bias[0] )]
self.nn_out = tf.matmul(self.act, self.weight[1]) + self.bias[1])
self.nn_softmax = tf.nn.softmax(self.nn_out)
self.nn_tgt = tf.placeholder("int64", shape=[None,])
self.cost_mean = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(self.nn_out, self.nn_tgt))
self.train_step = tf.train.GradientDescentOptimizer(self.learn_rate).minimize(self.cost_mean)
# saver
self.saver = tf.train.Saver()
self.sess = tf.Session()
self.sess.run(tf.initialize_all_variables())
Training:
for epoch in xrange(20):
feats_tr, tgts_tr = shuffle(feats_tr, tgts_tr, random_state=777)
# restore the exisiting model
ckpt = tf.train.get_checkpoint_state(ckpt_dir)
if ckpt and ckpt.model_checkpoint_path:
self.load(ckpt.model_checkpoint_path)
# mini-batch
tr_loss = []
for idx_begin in range(0,len(feats_tr), 512):
idx_end = idx_begin + batch_size
batch_feats, batch_tgts = feats_tr[idx_begin:idx_end],tgts_tr[idx_begin:idx_end]
_, loss_val = self.sess.run([self.train_step, self.cost_mean], feed_dict = {self.nn_in: batch_feats,
self.nn_tgt: batch_tgts,self.learn_rate: learn_rate})
tr_loss.append(loss_val)
# cross-validation
cv_loss = []
for idx_begin in range(0,len(feats_cv), 512):
idx_end = idx_begin + batch_size
batch_feats, batch_tgts = feats[idx_begin:idx_end],tgts[idx_begin:idx_end]
loss_all.append(self.sess.run(self.cost_mean,
feed_dict = { self.nn_in: batch_feats,
self.nn_tgt: batch_tgts}))
print( "Avg Loss for Training: "+str(np.mean(tr_loss)) + \
" Avg Loss for Validation: "+str(np.mean(cv_loss)) )
# save model per epoch if np.mean(cv_loss) less than previous
if (epoch+1)%1==0:
if loss_new < loss:
loss = loss_new
print( "Model accepted in epoch %d" %(epoch+1) )
# save model to ckpt_dir with mdl_nam
self.saver.save(self.sess, mdl_nam, global_step=epoch+1)
else:
print( "Model rejected in epoch %d" %(epoch+1) )
我生成了一个简单的退火学习率控制:如果交叉验证损失的平均值没有提高一定的阈值,那么将'learn_late'减半,最初为0.008。
与Kaldi nnet1相比,我检查了所有参数,现在唯一的区别是权重和偏差的初始化参数。我不确定初始化是否会影响太多。然而,在Tensorflow(平均CV损失1.99)训练期间'cv_loss'方面的收敛并不像Kaldi nnet1(平均CV损失0.95)那样好。有人可以帮助指出我做错了什么或者我错过了什么吗? 非常感谢提前!!!
答案 0 :(得分:0)
在每个纪元,您都会调用self.load(ckpt.model_checkpoint_path)
,这似乎会加载以前保存的权重。
您的模型无法了解是否在每个时期重置为初始权重。