Question

我正在尝试在自定义数据集上训练Google Colab上的ResNet56，其中每个图像的尺寸均为299x299x1。这是我得到的错误：

RewriteCond %{HTTP_HOST} ^(www\.)?example(\.co)?\.uk$ [NC]
RewriteRule ^(.*)$ https://www.example.uk/$1 [R=301,L]

这是我的模型配置：

ResourceExhaustedError:  OOM when allocating tensor with shape[32,16,299,299] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
     [[node resnet/conv2d_21/Conv2D (defined at <ipython-input-15-3b824ba8fe2a>:3) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
 [Op:__inference_train_function_21542]

Function call stack:
train_function

有什么想法吗？

Answer 1

如果内存不足，您将无能为力。

我能想到的是

减少BATCH_SIZE

减小图像输入大小。

如果您选择减小批量大小，那么您可能还需要降低学习率，如果您觉得它没有收敛。

P.S：如果您像在SGD(lr=1e-1, momentum=0.9)

那样放好势头，SGD会做的更好。

Answer 2

我也遇到了同样的错误，这是因为大图像尺寸或大批次我使用的图像尺寸为 512*512，批次尺寸为 10。我将批量大小减少到 2，它开始为我工作。

尝试在Google Colab上训练ResNet时出现ResourceExhaustedError

2 个答案: