Question

我希望使用https://github.com/tensorflow/models/tree/master/research/slim中的TF-Slim实现，在CIFAR-10上为某些基准测试目的重现NASNet模型的结果。为了从头开始训练这个模型，我按照脚本train_image_classifier.py的注释（第31-37行）中的说明，将以下几行添加到/nets/nasnet/models.py中的原始代码中：

第247行之后

：

elif FLAGS.learning_rate_decay_type == 'cosine':
    return tf.train.cosine_decay(FLAGS.learning_rate,
                                 global_step,
                                 decay_steps,
                                 name='cosine_decay_learning_rate')

第536行后

：

clone_gradients = tf.clip_by_global_norm(clones_gradients, 5.0)

下载CIFAR-10数据并将其转换为TFRecord格式后，我运行：

DATASET_DIR=/tmp/data/cifar10
TRAIN_DIR=/tmp/train_logs
python3 train_image_classifier.py \
      --train_dir=${TRAIN_DIR} \
      --dataset_name=cifar10 \
      --dataset_split_name=train \
      --dataset_dir=${DATASET_DIR} \
      --model_name=nasnet_cifar \
      --preprocessing_name=cifarnet  \
      --learning_rate=0.025 \
      --optimizer=momentum \
      --learning_rate_decay_type=cosine \
      --num_epochs_per_decay=600.0 \
      --batch_size=32

似乎即使在600个时期（= 937500步）之后训练仍在继续，但由于余弦衰减，因为学习率在600个时期之后变为0，所以参数不会更新。运行评估脚本：

DATASET_DIR=/tmp/data/cifar10
TRAIN_DIR=/tmp/train_logs
python3 eval_image_classifier.py \
      --alsologtostderr \
      --checkpoint_path=${TRAIN_DIR} \
      --dataset_name=cifar10 \
      --dataset_split_name=test \
      --dataset_dir=${DATASET_DIR} \
      --model_name=nasnet_cifar \
      --preprocessing_name=cifarnet

我得到以下结果：

/home/zelaa/anaconda3/lib/python3.6/site-packages/h5py/__init__.py:34: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
from ._conv import register_converters as _register_converters
WARNING:tensorflow:From eval_image_classifier.py:91: get_or_create_global_step (from tensorflow.contrib.framework.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.get_or_create_global_step
INFO:tensorflow:Scale of 0 disables regularizer.
2018-02-24 19:22:39.646499: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1105] Found device 0 with properties: 
name: GeForce GTX TITAN X major: 5 minor: 2 memoryClockRate(GHz): 1.076
pciBusID: 0000:02:00.0
totalMemory: 11.92GiB freeMemory: 11.81GiB
2018-02-24 19:22:39.646538: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1195] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce GTX TITAN X, pci bus id: 0000:02:00.0, compute capability: 5.2)
WARNING:tensorflow:From eval_image_classifier.py:155: streaming_accuracy (from tensorflow.contrib.metrics.python.ops.metric_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.metrics.accuracy. Note that the order of the labels and predictions arguments has been switched.
WARNING:tensorflow:From eval_image_classifier.py:157: streaming_recall_at_k (from tensorflow.contrib.metrics.python.ops.metric_ops) is deprecated and will be removed after 2016-11-08.
Instructions for updating:
Please use `streaming_sparse_recall_at_k`, and reshape labels from [batch_size] to [batch_size, 1].
INFO:tensorflow:Evaluating train_logs/model.ckpt-1002284
INFO:tensorflow:Starting evaluation at 2018-02-24-18:22:51
2018-02-24 19:22:52.383834: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1195] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce GTX TITAN X, pci bus id: 0000:02:00.0, compute capability: 5.2)
INFO:tensorflow:Restoring parameters from train_logs/model.ckpt-1002284
INFO:tensorflow:Evaluation [20/200]
INFO:tensorflow:Evaluation [40/200]
INFO:tensorflow:Evaluation [60/200]
INFO:tensorflow:Evaluation [80/200]
INFO:tensorflow:Evaluation [100/200]
INFO:tensorflow:Evaluation [120/200]
INFO:tensorflow:Evaluation [140/200]
INFO:tensorflow:Evaluation [160/200]
INFO:tensorflow:Evaluation [180/200]
INFO:tensorflow:Evaluation [200/200]
eval/Recall_5[0.9985]
eval/Accuracy[0.9577]
INFO:tensorflow:Finished evaluation at 2018-02-24-18:23:26

因此，一次运行的测试错误为4.23％，这与Learning Transferable Architectures for Scalable Image Recognition中显示的任何结果都不对应。我在这里缺少什么，这使我无法匹配纸张结果吗？

使用TF-Slim NASNet模型重现CIFAR-10的结果

0 个答案: