Question

我以Tensorflow Estimator格式实现GAN。这是gist中的完整代码。

可以正常训练模型。但是，它似乎永远挂在model.evaluate上。训练后的日志如下。

INFO:tensorflow:Starting evaluation at 2018-12-03-02:19:06
INFO:tensorflow:Graph was finalized.
2018-12-03 02:19:06.956750: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2018-12-03 02:19:06.956781: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-12-03 02:19:06.956786: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988]      0
2018-12-03 02:19:06.956790: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0:   N
2018-12-03 02:19:06.956912: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10464 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1)
INFO:tensorflow:Restoring parameters from /tensorlog/wad/acgan/a51fbd6/model.ckpt-10002
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.

如果我使用tf.estimator.train_and_evaluate，则评估的精度将始终为0.5。

我已经检查过我的tfrecords文件，它不为空，可以毫无问题地读取图像和标签。我也尝试过使用相同的tfrecords文件进行训练和评估，但仍然得到相同的结果。

在我看来，张量流模型可能在从检查点加载GAN的权重时遇到问题。如果是真的，该如何解决这个问题？

Answer 1

事实证明是因为training和dropout中的batch_normalization参数阻止了权重的恢复。将培训的价值固定为对或错即可解决问题。

Tensorflow GAN估算器在评估时挂起

1 个答案: