为什么我的.meta文件在培训时变大了?

时间:2018-03-01 08:24:35

标签: python tensorflow

在我的培训过程中,我的.meta文件越来越大。训练速度同时减慢。

defect-ckpt-0.meta为2.7M,defect-ckpt-100.meta为78M。

吹响我的日志:

01 Mar 2018 14:50:18 INFO 5641 EPOCH: 1 LOSS: 13.2140 STEP: 0
01 Mar 2018 14:50:20 INFO 5641 [model saved] EPOCH: 1 LOSS: 13.2140 STEP: 0
01 Mar 2018 14:51:18 INFO 5641 EPOCH: 1 LOSS: 8.9733 STEP: 10
01 Mar 2018 14:52:29 INFO 5641 EPOCH: 1 LOSS: 11.6544 STEP: 20
01 Mar 2018 14:53:53 INFO 5641 EPOCH: 1 LOSS: 9.9183 STEP: 30
01 Mar 2018 14:55:35 INFO 5641 EPOCH: 1 LOSS: 7.0450 STEP: 40
01 Mar 2018 14:57:25 INFO 5641 EPOCH: 1 LOSS: 8.1608 STEP: 50
01 Mar 2018 14:59:31 INFO 5641 EPOCH: 1 LOSS: 11.5794 STEP: 60
01 Mar 2018 15:01:51 INFO 5641 EPOCH: 1 LOSS: 9.8649 STEP: 70
01 Mar 2018 15:04:24 INFO 5641 EPOCH: 1 LOSS: 5.6884 STEP: 80
01 Mar 2018 15:07:17 INFO 5641 EPOCH: 1 LOSS: 7.0394 STEP: 90
01 Mar 2018 15:10:18 INFO 5641 EPOCH: 1 LOSS: 11.0385 STEP: 100
01 Mar 2018 15:11:09 INFO 5641 [model saved] EPOCH: 1 LOSS: 11.0385 STEP: 100
01 Mar 2018 15:14:27 INFO 5641 EPOCH: 1 LOSS: 7.6145 STEP: 110
01 Mar 2018 15:18:00 INFO 5641 EPOCH: 1 LOSS: 20.3605 STEP: 120
01 Mar 2018 15:21:51 INFO 5641 EPOCH: 1 LOSS: 8.6141 STEP: 130
01 Mar 2018 15:25:51 INFO 5641 EPOCH: 1 LOSS: 8.8579 STEP: 140
01 Mar 2018 15:30:15 INFO 5641 EPOCH: 1 LOSS: 8.0344 STEP: 150
01 Mar 2018 15:34:50 INFO 5641 EPOCH: 1 LOSS: 7.9116 STEP: 160
01 Mar 2018 15:39:37 INFO 5641 EPOCH: 1 LOSS: 12.1991 STEP: 170
01 Mar 2018 15:44:39 INFO 5641 EPOCH: 1 LOSS: 8.8730 STEP: 180
01 Mar 2018 15:49:57 INFO 5641 EPOCH: 1 LOSS: 9.3560 STEP: 190
01 Mar 2018 15:55:34 INFO 5641 EPOCH: 1 LOSS: 12.2240 STEP: 200
01 Mar 2018 16:01:20 INFO 5641 EPOCH: 1 LOSS: 6.1615 STEP: 210
01 Mar 2018 16:07:22 INFO 5641 EPOCH: 1 LOSS: 8.3846 STEP: 220
01 Mar 2018 16:13:36 INFO 5641 EPOCH: 1 LOSS: 11.8843 STEP: 230

您可以看到列车步骤之间的时间间隔正在快速增长。

我想把我的代码放在这里,data.input_fn是用yield python导入数据的输入函数。

with tf.Session() as sess:
    step = 0
    epoch = 0
    tf.global_variables_initializer().run() 
    tf.train.write_graph(sess.graph_def, '.', 'model/defect.pbtxt')
    loss_mean = 10000
    loss_list = []
    logger.info('training start!')
    while True:
        for img, label_no_satur, label_satur in data.input_fn(
                file_path=conf.train_path,
                infogain_path_dict=conf.train_infogain,
                batch_size=conf.batch_size):
            img, label_no_satur, label_satur = sess.run([img, label_no_satur, label_satur])
            _, L, summary = sess.run([train_op, loss, merged],
                                     feed_dict={IMG_IN:         img,
                                                LABEL_NO_SATUR: label_no_satur,
                                                LABEL_SATUR:    label_satur})
            train_writer.add_summary(summary, global_step=step)
            loss_list.append(L)
            if step % conf.log_step == 0:
                logger.info("EPOCH: {} LOSS: {:.4f} STEP: {}".format(epoch+1, L, step))
                if step % conf.save_step == 0:
                    if sum(loss_list) / len(loss_list) < loss_mean:
                        saver.save(sess, 'model/defect-ckpt',global_step=step)
                        logger.info("[model saved] EPOCH: {} LOSS: {:.4f} STEP: {}".format(epoch+1,L, step))
                        loss_mean = sum(loss_list) / len(loss_list)
                    loss_list = []
            step += 1
        epoch += 1

似乎我的模型图在培训时正在增长,但我没有在图中添加任何操作。

我使用estimator实现的代码训练了相同的模型,.meta是2.7M并且从未增长。训练速度也更快。

我的代码sess.run()有什么问题?

0 个答案:

没有答案