Tensorflow对象检测API:CUDA内存不足错误

时间:2019-05-23 19:31:24

标签: python tensorflow

我正在尝试从Tensorflow对象检测API运行model_main.py,但是遇到了一些内存问题。

以下是有关我的设置的一些信息:

操作系统:Ubuntu 19.04

图形:GeForce RTX 2060 / PCIe / SSE2

Tensorflow-gpu:1.12

我遵循了本教程。我尝试使用命令nvcc --version查找我正在运行的CUDA版本,并显示命令未找到。

在教程中,我运行了命令:

conda install \
tensorflow-gpu==1.12 \
cudatoolkit==9.0 \
cudnn=7.1.2 \
h5py

所以它不应该已经安装吗?为什么我的版本没有显示?

当尝试使用python3 model_main.py运行model_main.py --logtostderr --train_dir=training/ --pipeline_config_path=training/ssd_mobilenet_v1_pets.config时,出现以下错误:tensorflow.python.framework.errors_impl.InternalError: CUDA runtime implicit initialization on GPU:0 failed. Status: out of memory

早期运行时,我遇到了一个不同但相似的错误:CUDA OUT OF MEMORY ERROR。 我在运行时拍摄了我的GPU内存使用情况的快照,并且确实可以充分利用它。

我找到了this可能的解决方案,但是我不确定在哪里插入该行?我看不到.Session中正在运行的model_main.py方法。

如何配置我的GPU不使用其内存的100%或选择链接中要使用的百分比?

这是我的.config文件,如果有帮助的话:

# SSD with Mobilenet v1, configured for Oxford-IIIT Pets Dataset.
# Users should configure the fine_tune_checkpoint field in the train 
config as
# well as the label_map_path and input_path fields in the 
train_input_reader and
# eval_input_reader. Search for "PATH_TO_BE_CONFIGURED" to find the fields that
# should be configured.

model {
  ssd {
    num_classes: 1
    box_coder {
      faster_rcnn_box_coder {
        y_scale: 10.0
    x_scale: 10.0
    height_scale: 5.0
    width_scale: 5.0
  }
}
matcher {
  argmax_matcher {
    matched_threshold: 0.5
    unmatched_threshold: 0.5
    ignore_thresholds: false
    negatives_lower_than_unmatched: true
    force_match_for_each_row: true
  }
}
similarity_calculator {
  iou_similarity {
  }
}
anchor_generator {
  ssd_anchor_generator {
    num_layers: 6
    min_scale: 0.2
    max_scale: 0.95
    aspect_ratios: 1.0
    aspect_ratios: 2.0
    aspect_ratios: 0.5
    aspect_ratios: 3.0
    aspect_ratios: 0.3333
  }
}
image_resizer {
  fixed_shape_resizer {
    height: 300
    width: 300
  }
}
box_predictor {
  convolutional_box_predictor {
    min_depth: 0
    max_depth: 0
    num_layers_before_predictor: 0
    use_dropout: false
    dropout_keep_probability: 0.8
    kernel_size: 1
    box_code_size: 4
    apply_sigmoid_to_scores: false
    conv_hyperparams {
      activation: RELU_6,
      regularizer {
        l2_regularizer {
          weight: 0.00004
        }
      }
      initializer {
        truncated_normal_initializer {
          stddev: 0.03
          mean: 0.0
        }
      }
      batch_norm {
        train: true,
        scale: true,
        center: true,
        decay: 0.9997,
        epsilon: 0.001,
      }
    }
  }
}
feature_extractor {
  type: 'ssd_mobilenet_v1'
  min_depth: 16
  depth_multiplier: 1.0
  conv_hyperparams {
    activation: RELU_6,
    regularizer {
      l2_regularizer {
        weight: 0.00004
      }
    }
    initializer {
      truncated_normal_initializer {
        stddev: 0.03
        mean: 0.0
      }
    }
    batch_norm {
      train: true,
      scale: true,
      center: true,
      decay: 0.9997,
      epsilon: 0.001,
    }
  }
}
loss {
  classification_loss {
    weighted_sigmoid {
    }
  }
  localization_loss {
    weighted_smooth_l1 {
    }
  }
  hard_example_miner {
    num_hard_examples: 3000
    iou_threshold: 0.99
    loss_type: CLASSIFICATION
    max_negatives_per_positive: 3
    min_negatives_per_image: 0
  }
  classification_weight: 1.0
  localization_weight: 1.0
}
normalize_loss_by_num_matches: true
post_processing {
  batch_non_max_suppression {
    score_threshold: 1e-8
    iou_threshold: 0.6
    max_detections_per_class: 100
    max_total_detections: 100
  }
  score_converter: SIGMOID
}
  }
}

train_config: {
  batch_size: 1
  optimizer {
    rms_prop_optimizer: {
      learning_rate: {
    exponential_decay_learning_rate {
      initial_learning_rate: 0.004
      decay_steps: 800720
      decay_factor: 0.95
    }
  }
  momentum_optimizer_value: 0.9
  decay: 0.9
  epsilon: 1.0
}
}
  fine_tune_checkpoint: "ssd_mobilenet_v1_coco_11_06_2017/model.ckpt"
  from_detection_checkpoint: true
  load_all_detection_checkpoint_vars: true
  # Note: The below line limits the training process to 200K steps, which we
  # empirically found to be sufficient enough to train the pets dataset. This
  # effectively bypasses the learning rate schedule (the learning rate will
  # never decay). Remove the below line to train indefinitely.
  num_steps: 200000
  data_augmentation_options {
    random_horizontal_flip {
    }
  }
  data_augmentation_options {
ssd_random_crop {
    }
  }
}

train_input_reader: {
  tf_record_input_reader {
    input_path: "data/train.record"
  }
  label_map_path: "/home/rahme/Desktop/tensorflow/models/research/object_detection/training/rat_face_detection.pbtxt"
}

eval_config: {
  metrics_set: "coco_detection_metrics"
  num_examples: 2
}

eval_input_reader: {
  tf_record_input_reader {
input_path: "data/test.record"
}
label_map_path: "/home/rahme/Desktop/tensorflow/models/research/object_detection/training/rat_face_detection.pbtxt"
  shuffle: false
  num_readers: 1
}

编辑:Added snapshot of gpu processor

0 个答案:

没有答案