Question

我可能缺少明显的东西，但是按照running locally自述文件中概述的步骤进行操作之后，我无法在EC2 V100实例中成功提交训练作业。

到目前为止，我已完成以下步骤：

Tensorflow版本“ 1.13.1”

已将火车转换并测试为TFRecord格式
使用我的数据集的6个类创建了一个新的标签映射pb.txt。
更新了管道配置file以反映路径和类数。

我的最终目录结构如下（+表示文件夹，-表示文件）：

+ models
 + faster_rcnn_resnet101_coco_2018_01_28
   - model.ckpt.data-00000-of-00001
   - model.ckpt.meta
   - model.ckpt.index
 + model
    + train
    + eval
    - pipeline.config

+ data
 - train.record
 - test.record
 - tp_label_map.pbtxt

一个令人担心的问题是，我不知道模型中的train和eval文件夹对应于README。

填充环境变量并启动培训工作，如here所示。

PIPELINE_CONFIG_PATH=/home/ubuntu/models/research/object_detection/models/faster_rcnn_resnet101_coco_2018_01_28/pipeline.config
MODEL_DIR=/home/ubuntu/models/research/object_detection/models/model
NUM_TRAIN_STEPS=50000
SAMPLE_1_OF_N_EVAL_EXAMPLES=1
python object_detection/model_main.py \
    --pipeline_config_path=${PIPELINE_CONFIG_PATH} \
    --model_dir=${MODEL_DIR} \
    --num_train_steps=${NUM_TRAIN_STEPS} \
    --sample_1_of_n_eval_examples=$SAMPLE_1_OF_N_EVAL_EXAMPLES \
    --alsologtostderr

我收到以下警告，它仅在此处悬挂10分钟左右。不上火车。

但是我确实将文件填充到模型目录中（train和eval为空）。

+models
  - events.out.tfevents.1557175306.ip-172-31-32-179
  - graph.pbtxt 
  - model.ckpt-0.data-00000-of-00001  
  - model.ckpt-0.index 
  - model.ckpt-0.meta

如果您查找注释here，但是当我选中nvidia-smi或tensorboard时，则看不到任何内容。

张力板输出

WARNING: The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
If you depend on functionality not listed there, please file an issue.

*********** In model lib ************* /home/ubuntu/models/research/object_detection/models/faster_rcnn_resnet101_coco_2018_01_28/pipeline.config
WARNING:tensorflow:Forced number of epochs for all eval validations to be 1.
WARNING:tensorflow:Expected number of evaluation epochs is 1, but instead encountered `eval_on_train_input_config.num_epochs` = 0. Overwriting `num_epochs` to 1.
WARNING:tensorflow:Estimator's model_fn (<function create_model_fn.<locals>.model_fn at 0x7f6c5b26d048>) includes params argument, but params are not passed to Estimator.
WARNING:tensorflow:From /home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
WARNING:tensorflow:num_readers has been reduced to 1 to match input file shards.
WARNING:tensorflow:From /home/ubuntu/models/research/object_detection/builders/dataset_builder.py:80: parallel_interleave (from tensorflow.contrib.data.python.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.experimental.parallel_interleave(...)`.
WARNING:tensorflow:From /home/ubuntu/models/research/object_detection/utils/ops.py:472: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
WARNING:tensorflow:From /home/ubuntu/models/research/object_detection/inputs.py:320: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
WARNING:tensorflow:From /home/ubuntu/models/research/object_detection/builders/dataset_builder.py:152: batch_and_drop_remainder (from tensorflow.contrib.data.python.ops.batching) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.batch(..., drop_remainder=True)`.
WARNING:tensorflow:From /home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow/contrib/layers/python/layers/layers.py:1624: flatten (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.flatten instead.
WARNING:tensorflow:From /home/ubuntu/models/research/object_detection/meta_architectures/faster_rcnn_meta_arch.py:2298: get_or_create_global_step (from tensorflow.contrib.framework.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.get_or_create_global_step
WARNING:tensorflow:From /home/ubuntu/models/research/object_detection/core/losses.py:345: softmax_cross_entropy_with_logits (from tensorflow.python.ops.nn_ops) is deprecated and will be removed in a future version.
Instructions for updating:

Future major versions of TensorFlow will allow gradients to flow
into the labels input on backprop by default.

See `tf.nn.softmax_cross_entropy_with_logits_v2`.

/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow/python/ops/gradients_impl.py:110: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
WARNING:tensorflow:From /home/ubuntu/models/research/object_detection/eval_util.py:785: to_int64 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
WARNING:tensorflow:From /home/ubuntu/models/research/object_detection/utils/visualization_utils.py:429: py_func (from tensorflow.python.ops.script_ops) is deprecated and will be removed in a future version.
Instructions for updating:
tf.py_func is deprecated in TF V2. Instead, use
    tf.py_function, which takes a python function which manipulates tf eager
    tensors instead of numpy arrays. It's easy to convert a tf eager tensor to
    an ndarray (just call tensor.numpy()) but having access to eager tensors
    means `tf.py_function`s can use accelerators such as GPUs as well as
    being differentiable using a gradient tape.

WARNING:tensorflow:From /home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow/python/training/saver.py:1266: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to check for files with this prefix.

Answer 1

关于keyPressControl(event) { if ( event.keyCode == 83 && !this.isOverlayOpen && !document.querySelector('input:focus, textarea:focus') ) { this.staffSearchOpen(); } }的文档确实有些不清楚，但是源代码comment对此有清晰的解释。

因此model_dir是将新检查点文件保存到的目录，与用于微调的预先训练的检查点文件不同，因此您不应将model_dir设置为预先训练的检查点路径。

每次提交新的训练作业时，最好使model_dir不含检查点文件，否则，如果存在检查点文件，则该模型可能会跳过训练（here）。

此处列出了model_dir和train目录以供说明。可以这样设置目录结构，但不必相同。您只需要将一个空目录传递到eval即可保存检查点文件。

在我自己的数据集上运行Tensorflow对象检测训练作业的问题

Tensorflow版本“ 1.13.1”

1 个答案: