为什么出现Tensorflow内存不足错误?

时间:2020-02-06 15:38:54

标签: tensorflow

我正在使用GeForce Nvidia 2080 Ti GPU进行研究。我正在尝试建立用于脑肿瘤分割的Unet模型。我的代码运行了数百个批处理,然后发出了内存不足错误(OOM)。可能是什么问题呢?这是我的训练代码 '''

 def train_model(self,model):
        history=""
        print(model.summary())
        for ep in range(self.num_epoch):
            for batch in range(self.number_of_batches):
                print(batch,"/",self.number_of_batches,"/",ep)
                self.batch_images,self.batch_labels=self.get_batch(batch,self.all_files,file_format='channels_first')
                history=model.fit(x=self.batch_images,
                                  y=self.batch_labels,
                                  shuffle=True,
                                  epochs=1,
                                  verbose=1)
                #self.save_model_weights(self.model, history, epoch=batch)
            print("Epoch loss",ep,"==",np.average(history.history['loss']))
            #save weights on each iteration
            self.save_model_weights(model,history,epoch=ep)

'''

and here is the generated error
OP_REQUIRES failed at gather_op.cc:155 : Resource exhausted: OOM when allocating tensor with shape[5,4,240,240] and type double on /job:localhost/replica:0/task:0/device:CPU:0 by allocator cpu
2020-02-06 20:06:02.680769: W tensorflow/core/common_runtime/base_collective_executor.cc:217] BaseCollectiveExecutor::StartAbort Resource exhausted: OOM when allocating tensor with shape[5,4,240,240] and type double on /job:localhost/replica:0/task:0/device:CPU:0 by allocator cpu
     [[{{node GatherV2}}]]
     [[IteratorGetNext]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

     [[IteratorGetNext/_4]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

2020-02-06 20:06:02.681752: W tensorflow/core/common_runtime/base_collective_executor.cc:217] BaseCollectiveExecutor::StartAbort Resource exhausted: OOM when allocating tensor with shape[5,4,240,240] and type double on /job:localhost/replica:0/task:0/device:CPU:0 by allocator cpu
     [[{{node GatherV2}}]]
     [[IteratorGetNext]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

Traceback (most recent call last):
  File "E:/PyCharmProjects/TF_tutorials/BrainSeg/seg.py", line 335, in <module>
    mu.train_model(segnet)
  File "E:/PyCharmProjects/TF_tutorials/BrainSeg/seg.py", line 296, in train_model
    verbose=1)
  File "E:\PyCharmProjects\TF_tutorials\venv\lib\site-packages\tensorflow_core\python\keras\engine\training.py", line 819, in fit
    use_multiprocessing=use_multiprocessing)
  File "E:\PyCharmProjects\TF_tutorials\venv\lib\site-packages\tensorflow_core\python\keras\engine\training_v2.py", line 342, in fit
    total_epochs=epochs)
  File "E:\PyCharmProjects\TF_tutorials\venv\lib\site-packages\tensorflow_core\python\keras\engine\training_v2.py", line 128, in run_one_epoch
    batch_outs = execution_function(iterator)
  File "E:\PyCharmProjects\TF_tutorials\venv\lib\site-packages\tensorflow_core\python\keras\engine\training_v2_utils.py", line 98, in execution_function
    distributed_function(input_fn))
  File "E:\PyCharmProjects\TF_tutorials\venv\lib\site-packages\tensorflow_core\python\eager\def_function.py", line 568, in __call__
    result = self._call(*args, **kwds)
  File "E:\PyCharmProjects\TF_tutorials\venv\lib\site-packages\tensorflow_core\python\eager\def_function.py", line 599, in _call
    return self._stateless_fn(*args, **kwds)  # pylint: disable=not-callable
  File "E:\PyCharmProjects\TF_tutorials\venv\lib\site-packages\tensorflow_core\python\eager\function.py", line 2363, in __call__

5/5 [==============================] - 0s 2ms/sample
    return graph_function._filtered_call(args, kwargs)  # pylint: disable=protected-access
  File "E:\PyCharmProjects\TF_tutorials\venv\lib\site-packages\tensorflow_core\python\eager\function.py", line 1611, in _filtered_call
    self.captured_inputs)
  File "E:\PyCharmProjects\TF_tutorials\venv\lib\site-packages\tensorflow_core\python\eager\function.py", line 1692, in _call_flat
    ctx, args, cancellation_manager=cancellation_manager))
  File "E:\PyCharmProjects\TF_tutorials\venv\lib\site-packages\tensorflow_core\python\eager\function.py", line 545, in call
    ctx=ctx)
  File "E:\PyCharmProjects\TF_tutorials\venv\lib\site-packages\tensorflow_core\python\eager\execute.py", line 67, in quick_execute
    six.raise_from(core._status_to_exception(e.code, message), None)
  File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.ResourceExhaustedError: 2 root error(s) found.
  (0) Resource exhausted:  OOM when allocating tensor with shape[5,4,240,240] and type double on /job:localhost/replica:0/task:0/device:CPU:0 by allocator cpu
     [[{{node GatherV2}}]]
     [[IteratorGetNext]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

     [[IteratorGetNext/_4]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

  (1) Resource exhausted:  OOM when allocating tensor with shape[5,4,240,240] and type double on /job:localhost/replica:0/task:0/device:CPU:0 by allocator cpu
     [[{{node GatherV2}}]]
     [[IteratorGetNext]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

0 successful operations.
0 derived errors ignored. [Op:__inference_distributed_function_7060]

Function call stack:
distributed_function -> distributed_function

任何人都可以帮助我解决此问题吗?

0 个答案:

没有答案