微调自定义图像数据集的inception_v3 FailedPrecondition错误

时间:2017-07-02 11:40:14

标签: python tensorflow models

我正在使用tf(1.1.0和1.2.0) models / slim / scripts 来finetune_inception_v3我的2个自定义" logos"图像数据集,之前已转换为TFRecords

泊坞

所有在Docker for windows中的tensorflow容器没有GPU

docker run -it -v c:/tf_files:/tf_files gcr.io/tensorflow/tensorflow:1.1.0-devel (also 1.2.0-devel)

TFRecords

bazel-bin/inception/build_image_data \
  --train_directory="${TRAIN_DIR}" \
  --validation_directory="${VALIDATION_DIR}" \
  --output_directory="${OUTPUT_DIRECTORY}" \
  --labels_file="${LABELS_FILE}" \
  --train_shards=2 \
  --validation_shards=2 \
  --num_threads=2

使用LABELS_FILE = / tmp / data / labels.txt获取2个图像类别

ProperLogos
OtherLogos

模型Slim cloned

/slimtf# git clone https://github.com/tensorflow/models.git

logos.py 添加到数据集目录并更新 dataset_factory.py 以支持"徽标"自定义数据集
更改了 datatset_utils.py ,其中更改了LABELS_FILENAME =' labels2.txt' 因为它预期" labels_id冒号名称"我在代码中找到格式

0:ProperLogos
1:OtherLogos

微调

/slimtf/models/slim# ./scripts/finetune_inception_v3_on_logos.sh

启动1st cmd train_image_classifier.py
从/tmp/checkpoints/inception_v3.ckpt

进行微调
# Fine-tune only the new layers for 1000 steps.
python train_image_classifier.py \
  --train_dir=${TRAIN_DIR} \
  --dataset_name=logos \
  --dataset_split_name=train \
  --dataset_dir=${DATASET_DIR} \
  --model_name=inception_v3 \
  --checkpoint_path=${PRETRAINED_CHECKPOINT_DIR}/inception_v3.ckpt \
  --checkpoint_exclude_scopes=InceptionV3/Logits,InceptionV3/AuxLogits \
  --trainable_scopes=InceptionV3/Logits,InceptionV3/AuxLogits \
  --max_number_of_steps=1000 \
  --batch_size=12 \
  --learning_rate=0.01 \
  --learning_rate_decay_type=fixed \
  --save_interval_secs=60 \
  --save_summaries_secs=60 \
  --log_every_n_steps=100 \
  --clone_on_cpu=True \
  --optimizer=rmsprop \
  --weight_decay=0.00004

它在FailedPreconditionError上失败:/ tmp / logos / train

 /slimtf/models/slim# ./scripts/finetune_inception_v3_on_logos.sh

INFO:tensorflow:Summary name /clone_loss is illegal; using clone_loss instead.
INFO:tensorflow:Ignoring --checkpoint_path because a checkpoint already exists in /tmp/models-logos4a/inception_v3
2017-07-02 02:11:51.859377: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-07-02 02:11:51.859402: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-07-02 02:11:51.859408: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-07-02 02:11:51.859413: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-07-02 02:11:51.859418: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
INFO:tensorflow:Restoring parameters from /tmp/models-logos4a/inception_v3/model.ckpt-0
INFO:tensorflow:Starting Session.
INFO:tensorflow:Saving checkpoint to path /tmp/models-logos4a/inception_v3/model.ckpt
INFO:tensorflow:Starting Queues.
INFO:tensorflow:Error reported to Coordinator: <class 'tensorflow.python.framework.errors_impl.FailedPreconditionError'>, /tmp/logos/train
         [[Node: parallel_read/ReaderReadV2_1 = ReaderReadV2[_device="/job:localhost/replica:0/task:0/cpu:0"](parallel_read/TFRecordReaderV2_1, parallel_read/filenames)]]
2017-07-02 02:11:55.725083: W tensorflow/core/kernels/queue_base.cc:303] _6_prefetch_queue/fifo_queue: Skipping cancelled dequeue attempt with queue not closed
INFO:tensorflow:global_step/sec: 0
INFO:tensorflow:Caught OutOfRangeError. Stopping Training.
INFO:tensorflow:Finished training! Saving model to disk.
Traceback (most recent call last):
  File "train_image_classifier.py", line 573, in <module>
    tf.app.run()
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 48, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "train_image_classifier.py", line 569, in main
    sync_optimizer=optimizer if FLAGS.sync_replicas else None)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/slim/python/slim/learning.py", line 767, in train
    sv.stop(threads, close_summary_writer=True)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/supervisor.py", line 792, in stop
    stop_grace_period_secs=self._stop_grace_secs)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/coordinator.py", line 389, in join
    six.reraise(*self._exc_info_to_raise)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/queue_runner_impl.py", line 238, in _run
    enqueue_callable()
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1193, in _single_operation_run
    target_list_as_strings, status, None)
  File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
    self.gen.next()
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_status
    pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.FailedPreconditionError: /tmp/logos/train
         [[Node: parallel_read/ReaderReadV2_1 = ReaderReadV2[_device="/job:localhost/replica:0/task:0/cpu:0"](parallel_read/TFRecordReaderV2_1, parallel_read/filenames)]]

我的输入目录中是否有任何缺少列车和验证的内容? 的/ tmp /标识/列车

train-00000-of-00002  train-00001-of-00002

的/ tmp /标识/验证

validation-00000-of-00002  validation-00001-of-00002

0 个答案:

没有答案