Question

我正在运行TensorFlow示例https://github.com/tensorflow/models/tree/master/tutorials/image/cifar10_estimator的修改版本，并且内存不足。

ResourceExhausted错误说： Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

我尝试将其添加到main（）的明显位置，但是我收到了protobuf错误的变体，该错误表示未找到report_tensor_allocations_upon_oom运行选项。

def main(job_dir, data_dir, num_gpus, variable_strategy,
         use_distortion_for_training, log_device_placement, num_intra_threads,
         **hparams):
  # The env variable is on deprecation path, default is set to off.
  os.environ['TF_SYNC_ON_FINISH'] = '0'
  os.environ['TF_ENABLE_WINOGRAD_NONFUSED'] = '1'

  # Session configuration.
  sess_config = tf.ConfigProto(
      allow_soft_placement=True,
      log_device_placement=log_device_placement,
      intra_op_parallelism_threads=num_intra_threads,
      report_tensor_allocations_upon_oom = True, # Nope
      gpu_options=tf.GPUOptions(
           force_gpu_compatible=True, 
           report_tensor_allocations_upon_oom = True))  # Nope

  config = cifar10_utils.RunConfig(
      session_config=sess_config, model_dir=job_dir, 
      report_tensor_allocations_upon_oom = True)  #Nope
  tf.contrib.learn.learn_runner.run(
      get_experiment_fn(data_dir, num_gpus, variable_strategy,
                        use_distortion_for_training),
      run_config=config,
      hparams=tf.contrib.training.HParams(
          is_chief=config.is_chief,
          **hparams))

在此示例中，我在哪里添加report_tensor_allocations_upon_oom = True？

Answer 1

您需要注册一个会话运行挂钩，以将额外的参数传递给估算器执行的session.run()调用。

class OomReportingHook(SessionRunHook):
  def before_run(self, run_context):
    return SessionRunArgs(fetches=[],  # no extra fetches
                          options=tf.RunOptions(
                              report_tensor_allocations_upon_oom=True))

将hooks列表中的钩子传递给估算器中的相关方法： https://www.tensorflow.org/api_docs/python/tf/estimator/Estimator

将report_tensor_allocations_upon_oom添加到cifar10_estimator示例

1 个答案: