在我的培训过程中,我的.meta
文件越来越大。训练速度同时减慢。
defect-ckpt-0.meta
为2.7M,defect-ckpt-100.meta
为78M。
吹响我的日志:
01 Mar 2018 14:50:18 INFO 5641 EPOCH: 1 LOSS: 13.2140 STEP: 0
01 Mar 2018 14:50:20 INFO 5641 [model saved] EPOCH: 1 LOSS: 13.2140 STEP: 0
01 Mar 2018 14:51:18 INFO 5641 EPOCH: 1 LOSS: 8.9733 STEP: 10
01 Mar 2018 14:52:29 INFO 5641 EPOCH: 1 LOSS: 11.6544 STEP: 20
01 Mar 2018 14:53:53 INFO 5641 EPOCH: 1 LOSS: 9.9183 STEP: 30
01 Mar 2018 14:55:35 INFO 5641 EPOCH: 1 LOSS: 7.0450 STEP: 40
01 Mar 2018 14:57:25 INFO 5641 EPOCH: 1 LOSS: 8.1608 STEP: 50
01 Mar 2018 14:59:31 INFO 5641 EPOCH: 1 LOSS: 11.5794 STEP: 60
01 Mar 2018 15:01:51 INFO 5641 EPOCH: 1 LOSS: 9.8649 STEP: 70
01 Mar 2018 15:04:24 INFO 5641 EPOCH: 1 LOSS: 5.6884 STEP: 80
01 Mar 2018 15:07:17 INFO 5641 EPOCH: 1 LOSS: 7.0394 STEP: 90
01 Mar 2018 15:10:18 INFO 5641 EPOCH: 1 LOSS: 11.0385 STEP: 100
01 Mar 2018 15:11:09 INFO 5641 [model saved] EPOCH: 1 LOSS: 11.0385 STEP: 100
01 Mar 2018 15:14:27 INFO 5641 EPOCH: 1 LOSS: 7.6145 STEP: 110
01 Mar 2018 15:18:00 INFO 5641 EPOCH: 1 LOSS: 20.3605 STEP: 120
01 Mar 2018 15:21:51 INFO 5641 EPOCH: 1 LOSS: 8.6141 STEP: 130
01 Mar 2018 15:25:51 INFO 5641 EPOCH: 1 LOSS: 8.8579 STEP: 140
01 Mar 2018 15:30:15 INFO 5641 EPOCH: 1 LOSS: 8.0344 STEP: 150
01 Mar 2018 15:34:50 INFO 5641 EPOCH: 1 LOSS: 7.9116 STEP: 160
01 Mar 2018 15:39:37 INFO 5641 EPOCH: 1 LOSS: 12.1991 STEP: 170
01 Mar 2018 15:44:39 INFO 5641 EPOCH: 1 LOSS: 8.8730 STEP: 180
01 Mar 2018 15:49:57 INFO 5641 EPOCH: 1 LOSS: 9.3560 STEP: 190
01 Mar 2018 15:55:34 INFO 5641 EPOCH: 1 LOSS: 12.2240 STEP: 200
01 Mar 2018 16:01:20 INFO 5641 EPOCH: 1 LOSS: 6.1615 STEP: 210
01 Mar 2018 16:07:22 INFO 5641 EPOCH: 1 LOSS: 8.3846 STEP: 220
01 Mar 2018 16:13:36 INFO 5641 EPOCH: 1 LOSS: 11.8843 STEP: 230
您可以看到列车步骤之间的时间间隔正在快速增长。
我想把我的代码放在这里,data.input_fn
是用yield
python导入数据的输入函数。
with tf.Session() as sess:
step = 0
epoch = 0
tf.global_variables_initializer().run()
tf.train.write_graph(sess.graph_def, '.', 'model/defect.pbtxt')
loss_mean = 10000
loss_list = []
logger.info('training start!')
while True:
for img, label_no_satur, label_satur in data.input_fn(
file_path=conf.train_path,
infogain_path_dict=conf.train_infogain,
batch_size=conf.batch_size):
img, label_no_satur, label_satur = sess.run([img, label_no_satur, label_satur])
_, L, summary = sess.run([train_op, loss, merged],
feed_dict={IMG_IN: img,
LABEL_NO_SATUR: label_no_satur,
LABEL_SATUR: label_satur})
train_writer.add_summary(summary, global_step=step)
loss_list.append(L)
if step % conf.log_step == 0:
logger.info("EPOCH: {} LOSS: {:.4f} STEP: {}".format(epoch+1, L, step))
if step % conf.save_step == 0:
if sum(loss_list) / len(loss_list) < loss_mean:
saver.save(sess, 'model/defect-ckpt',global_step=step)
logger.info("[model saved] EPOCH: {} LOSS: {:.4f} STEP: {}".format(epoch+1,L, step))
loss_mean = sum(loss_list) / len(loss_list)
loss_list = []
step += 1
epoch += 1
似乎我的模型图在培训时正在增长,但我没有在图中添加任何操作。
我使用estimator
实现的代码训练了相同的模型,.meta
是2.7M并且从未增长。训练速度也更快。
我的代码sess.run()
有什么问题?