Question

我正在尝试使用Tensorflow Object Detection API（Mask RCNN）训练实例分割模型，并遵循here的说明。

我正在使用预训练的mask_rcnn_resnet50_atrous_coco来初始化权重，并为该模型调整this样本配置文件。我根据create_coco_tf_record.py创建了带有训练和评估集掩码的tfrecord文件。我能够成功运行训练脚本，但问题是，除GPU内存外，它占用45GB的{{1}}左右。除此之外，一切运行良好，我能够完成高达10k步的训练，之后它决定需要更多RAM并占用RAM左右，这会导致我的系统崩溃。当我在训练后运行评估脚本时，会发生同样的事情。以下是我的系统规格：

Ubuntu 16.04
Tensorflow 1.5.0（根据文档通过pip安装）
Python 2.7.12
CUDA 9和CuDnn 7
Gtx 1080（8 GB）
32 GB RAM，32 GB交换

当我在GPU上运行模型时，我不确定为什么tensorflow需要这么多RAM。我只有60GB和1 foreground class个培训样本，每张图片最多500个。以下是来自50 objects/masks和System Monitor以及我的nvidia-smi文件的一些屏幕截图。如果需要，我还会上传我的脚本来创建tfrecord文件。

这是我的pipeline_config：

pipeline_config

更新1： 我尝试在# Mask R-CNN with Resnet-50 (v1), Atrous version # Configured for MSCOCO Dataset. # Users should configure the fine_tune_checkpoint field in the train config as # well as the label_map_path and input_path fields in the train_input_reader and # eval_input_reader. Search for "PATH_TO_BE_CONFIGURED" to find the fields that # should be configured. model { faster_rcnn { num_classes: 1 image_resizer { keep_aspect_ratio_resizer { min_dimension: 300 max_dimension: 400 } } number_of_stages: 3 feature_extractor { type: 'faster_rcnn_resnet50' first_stage_features_stride: 8 } first_stage_anchor_generator { grid_anchor_generator { scales: [0.25, 0.5, 1.0, 2.0] aspect_ratios: [0.5, 1.0, 2.0] height_stride: 8 width_stride: 8 } } first_stage_atrous_rate: 2 first_stage_box_predictor_conv_hyperparams { op: CONV regularizer { l2_regularizer { weight: 0.0 } } initializer { truncated_normal_initializer { stddev: 0.01 } } } first_stage_nms_score_threshold: 0.0 first_stage_nms_iou_threshold: 0.7 first_stage_max_proposals: 300 first_stage_localization_loss_weight: 2.0 first_stage_objectness_loss_weight: 1.0 initial_crop_size: 14 maxpool_kernel_size: 2 maxpool_stride: 2 second_stage_box_predictor { mask_rcnn_box_predictor { use_dropout: true dropout_keep_probability: 0.5 predict_instance_masks: true mask_height: 33 mask_width: 33 mask_prediction_conv_depth: 0 mask_prediction_num_conv_layers: 4 fc_hyperparams { op: FC regularizer { l2_regularizer { weight: 0.0 } } initializer { variance_scaling_initializer { factor: 1.0 uniform: true mode: FAN_AVG } } } conv_hyperparams { op: CONV regularizer { l2_regularizer { weight: 0.0 } } initializer { truncated_normal_initializer { stddev: 0.01 } } } } } second_stage_post_processing { batch_non_max_suppression { score_threshold: 0.0 iou_threshold: 0.6 max_detections_per_class: 100 max_total_detections: 300 } score_converter: SOFTMAX } second_stage_localization_loss_weight: 2.0 second_stage_classification_loss_weight: 1.0 second_stage_mask_prediction_loss_weight: 4.0 second_stage_batch_size: 4 } } train_config: { batch_size: 1 optimizer { momentum_optimizer: { learning_rate: { manual_step_learning_rate { initial_learning_rate: 0.0003 schedule { step: 0 learning_rate: .0003 } schedule { step: 900000 learning_rate: .00003 } schedule { step: 1200000 learning_rate: .000003 } } } momentum_optimizer_value: 0.9 } use_moving_average: false } gradient_clipping_by_norm: 10.0 fine_tune_checkpoint: "/media/ahmed/1A6E52446E5218B9/Projects/TF/MaskRCNN/pretrained_models/mask_rcnn_resnet50_atrous_coco_2018_01_28/model.ckpt" from_detection_checkpoint: true # Note: The below line limits the training process to 200K steps, which we # empirically found to be sufficient enough to train the pets dataset. This # effectively bypasses the learning rate schedule (the learning rate will # never decay). Remove the below line to train indefinitely. num_steps: 50000 data_augmentation_options { random_horizontal_flip { } } } train_input_reader: { tf_record_input_reader { input_path: "/media/ahmed/1A6E52446E5218B9/Projects/TF/MaskRCNN/train_mask.record" } label_map_path: "/media/ahmed/1A6E52446E5218B9/Projects/TF/label_map.pbtxt" load_instance_masks: true mask_type: PNG_MASKS } eval_config: { num_examples: 200 num_visualizations : 200 # Note: The below line limits the evaluation process to 10 evaluations. # Remove the below line to evaluate indefinitely. max_evals: 10 } eval_input_reader: { tf_record_input_reader { input_path: "/media/ahmed/1A6E52446E5218B9/Projects/TF/MaskRCNN/val_mask.record" } label_map_path: "/media/ahmed/1A6E52446E5218B9/Projects/TF/label_map.pbtxt" load_instance_masks: true mask_type: PNG_MASKS shuffle: false num_readers: 1 }上运行object_detection_tutorial.ipynb。只要我不调用掩码输出节点，它就可以运行2 GB的GPU内存。我得到正确的盒子，课程和分数。当我尝试在frozen_inference_graph中获得detection_masks:0的输出时，我的8GB GPU内存不足。我尝试在CPU模式下运行脚本，发现我得到了正确的掩码，我的RAM内存使用率从未超过8GB（从2.5GB）。在运行sess.run()和train.py脚本时，同样的模型占用了我的整个GPU以及45GB的RAM。

Tensorflow对象检测Api实例分段占用整个RAM（32 GB）

0 个答案: