CNN的ResourceExhaustedError

时间:2018-07-01 22:55:22

标签: tensorflow deep-learning

我正在尝试用此this tutorial完成source code

我尝试使用它们的大图像数据以及我自己的52个图像的小数据集(46x46),但我一直遇到ResourceExhaustedError

   ResourceExhaustedError  OOM when allocating tensor with shape[1016064,1024]

有什么办法可以编辑此代码,以便在较小的训练集上进行训练,从而避免出现此错误?

我尝试更改代码中的批处理大小,但这没有完成。我还确保我没有任何先前的tensorflow项目正在运行(我重新启动了计算机)

我的label.txt包含这两行:

cat
dog

,我的火车和验证文件夹包含2个具有相同名称的子文件夹,其中包含图像。


我正在使用: GeForce GTX 850M主要:5个次要:0 memoryClockRate(GHz):0.9015

总内存:4.00GiB空闲内存:3.35GiB


在我看到错误之前,我先打印出了这张照片:

Limit:                  3235767910
InUse:                      223232
MaxInUse:                   223232
NumAllocs:                      17
MaxAllocSize:               204800

这是我的全部错误:

    2018-07-01 14:55:45.724585: W C:\tf_jenkins\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\common_runtime\bfc_allocator.cc:279] *___________________________________________________________________________________________________
2018-07-01 14:55:45.725147: W C:\tf_jenkins\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\framework\op_kernel.cc:1202] OP_REQUIRES failed at random_op.cc:202 : Resource exhausted: OOM when allocating tensor with shape[1016064,1024] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
Traceback (most recent call last):
  File "C:\Users\Mac\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\client\session.py", line 1361, in _do_call
    return fn(*args)
  File "C:\Users\Mac\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\client\session.py", line 1340, in _run_fn
    target_list, status, run_metadata)
  File "C:\Users\Mac\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\framework\errors_impl.py", line 516, in __exit__
    c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[1016064,1024] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
     [[Node: dense/kernel/Initializer/random_uniform/RandomUniform = RandomUniform[T=DT_INT32, _class=["loc:@dense/kernel"], dtype=DT_FLOAT, seed=0, seed2=0, _device="/job:localhost/replica:0/task:0/device:GPU:0"](dense/kernel/Initializer/random_uniform/shape)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:/Users/Mac/Desktop/tensorflow/cnn_dog_vs_cat-master/cnn_dog_cat.py", line 175, in <module>
    tf.app.run()
  File "C:\Users\Mac\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\platform\app.py", line 126, in run
    _sys.exit(main(argv))
  File "C:/Users/Mac/Desktop/tensorflow/cnn_dog_vs_cat-master/cnn_dog_cat.py", line 167, in main
    classifier.train(input_fn=lambda: train_input_fn(train_list), steps=10, hooks=[logging_hook])
  File "C:\Users\Mac\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\estimator\estimator.py", line 352, in train
    loss = self._train_model(input_fn, hooks, saving_listeners)
  File "C:\Users\Mac\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\estimator\estimator.py", line 888, in _train_model
    log_step_count_steps=self._config.log_step_count_steps) as mon_sess:
  File "C:\Users\Mac\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\training\monitored_session.py", line 384, in MonitoredTrainingSession
    stop_grace_period_secs=stop_grace_period_secs)
  File "C:\Users\Mac\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\training\monitored_session.py", line 795, in __init__
    stop_grace_period_secs=stop_grace_period_secs)
  File "C:\Users\Mac\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\training\monitored_session.py", line 518, in __init__
    self._sess = _RecoverableSession(self._coordinated_creator)
  File "C:\Users\Mac\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\training\monitored_session.py", line 981, in __init__
    _WrappedSession.__init__(self, self._create_session())
  File "C:\Users\Mac\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\training\monitored_session.py", line 986, in _create_session
    return self._sess_creator.create_session()
  File "C:\Users\Mac\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\training\monitored_session.py", line 675, in create_session
    self.tf_sess = self._session_creator.create_session()
  File "C:\Users\Mac\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\training\monitored_session.py", line 446, in create_session
    init_fn=self._scaffold.init_fn)
  File "C:\Users\Mac\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\training\session_manager.py", line 281, in prepare_session
    sess.run(init_op, feed_dict=init_feed_dict)
  File "C:\Users\Mac\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\client\session.py", line 905, in run
    run_metadata_ptr)
  File "C:\Users\Mac\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\client\session.py", line 1137, in _run
    feed_dict_tensor, options, run_metadata)
  File "C:\Users\Mac\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\client\session.py", line 1355, in _do_run
    options, run_metadata)
  File "C:\Users\Mac\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\client\session.py", line 1374, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[1016064,1024] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
     [[Node: dense/kernel/Initializer/random_uniform/RandomUniform = RandomUniform[T=DT_INT32, _class=["loc:@dense/kernel"], dtype=DT_FLOAT, seed=0, seed2=0, _device="/job:localhost/replica:0/task:0/device:GPU:0"](dense/kernel/Initializer/random_uniform/shape)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.


Caused by op 'dense/kernel/Initializer/random_uniform/RandomUniform', defined at:
  File "C:/Users/Mac/Desktop/tensorflow/cnn_dog_vs_cat-master/cnn_dog_cat.py", line 175, in <module>
    tf.app.run()
  File "C:\Users\Mac\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\platform\app.py", line 126, in run
    _sys.exit(main(argv))
  File "C:/Users/Mac/Desktop/tensorflow/cnn_dog_vs_cat-master/cnn_dog_cat.py", line 167, in main
    classifier.train(input_fn=lambda: train_input_fn(train_list), steps=10, hooks=[logging_hook])
  File "C:\Users\Mac\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\estimator\estimator.py", line 352, in train
    loss = self._train_model(input_fn, hooks, saving_listeners)
  File "C:\Users\Mac\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\estimator\estimator.py", line 812, in _train_model
    features, labels, model_fn_lib.ModeKeys.TRAIN, self.config)
  File "C:\Users\Mac\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\estimator\estimator.py", line 793, in _call_model_fn
    model_fn_results = self._model_fn(features=features, **kwargs)
  File "C:/Users/Mac/Desktop/tensorflow/cnn_dog_vs_cat-master/cnn_dog_cat.py", line 50, in cnn_model_fn
    dense = tf.layers.dense(inputs=pool2_flat, units=1024, activation=tf.nn.relu)
  File "C:\Users\Mac\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\layers\core.py", line 248, in dense
    return layer.apply(inputs)
  File "C:\Users\Mac\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\layers\base.py", line 809, in apply
    return self.__call__(inputs, *args, **kwargs)
  File "C:\Users\Mac\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\layers\base.py", line 680, in __call__
    self.build(input_shapes)
  File "C:\Users\Mac\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\layers\core.py", line 134, in build
    trainable=True)
  File "C:\Users\Mac\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\layers\base.py", line 533, in add_variable
    partitioner=partitioner)
  File "C:\Users\Mac\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\ops\variable_scope.py", line 1297, in get_variable
    constraint=constraint)
  File "C:\Users\Mac\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\ops\variable_scope.py", line 1093, in get_variable
    constraint=constraint)
  File "C:\Users\Mac\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\ops\variable_scope.py", line 439, in get_variable
    constraint=constraint)
  File "C:\Users\Mac\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\ops\variable_scope.py", line 408, in _true_getter
    use_resource=use_resource, constraint=constraint)
  File "C:\Users\Mac\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\ops\variable_scope.py", line 800, in _get_single_variable
    use_resource=use_resource)
  File "C:\Users\Mac\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\ops\variable_scope.py", line 2157, in variable
    use_resource=use_resource)
  File "C:\Users\Mac\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\ops\variable_scope.py", line 2147, in <lambda>
    previous_getter = lambda **kwargs: default_variable_creator(None, **kwargs)
  File "C:\Users\Mac\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\ops\variable_scope.py", line 2130, in default_variable_creator
    constraint=constraint)
  File "C:\Users\Mac\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\ops\variables.py", line 233, in __init__
    constraint=constraint)
  File "C:\Users\Mac\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\ops\variables.py", line 327, in _init_from_args
    initial_value(), name="initial_value", dtype=dtype)
  File "C:\Users\Mac\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\ops\variable_scope.py", line 784, in <lambda>
    shape.as_list(), dtype=dtype, partition_info=partition_info)
  File "C:\Users\Mac\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\ops\init_ops.py", line 472, in __call__
    shape, -limit, limit, dtype, seed=self.seed)
  File "C:\Users\Mac\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\ops\random_ops.py", line 244, in random_uniform
    shape, dtype, seed=seed1, seed2=seed2)
  File "C:\Users\Mac\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\ops\gen_random_ops.py", line 473, in _random_uniform
    name=name)
  File "C:\Users\Mac\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "C:\Users\Mac\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\framework\ops.py", line 3271, in create_op
    op_def=op_def)
  File "C:\Users\Mac\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\framework\ops.py", line 1650, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[1016064,1024] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
     [[Node: dense/kernel/Initializer/random_uniform/RandomUniform = RandomUniform[T=DT_INT32, _class=["loc:@dense/kernel"], dtype=DT_FLOAT, seed=0, seed2=0, _device="/job:localhost/replica:0/task:0/device:GPU:0"](dense/kernel/Initializer/random_uniform/shape)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

1 个答案:

答案 0 :(得分:2)

OOM是由密集层线50的分配引起的:

pool2_flat = tf.reshape(pool2, [-1, 126 * 126 * 64])
dense = tf.layers.dense(inputs=pool2_flat, units=1024, activation=tf.nn.relu)

您可以:

  • 减少图层中的神经元数量(例如,从1024减少到64)
  • 减小输入图像的尺寸
  • 增加特征提取器的下采样因子(此处,由于只有2个最大的poooling层,且步幅为2,因此下采样因子为4)

顺便说一句,我强烈建议不要在tf.reshape中使用硬编码形状。也许使用tf.layers.flatten,它对体系结构修改很健壮。