Tensorflow CNN Trianing ResourceExhaustedError:分配具有形状的张量时出现OOM []

时间:2020-01-13 19:32:41

标签: tensorflow deep-learning gpu

我正在尝试使用带有GPU(GeForce 940 M)的Tensorflow 2.0来训练深层残余净功(ResNet34,总共21,302,722个参数)。顺序模型定义如下:

model = keras.models.Sequential()
model.add(DefaultConv2D(64, kernel_size=7, strides=2,
                        input_shape=[224, 224, 3]))
model.add(keras.layers.BatchNormalization())
model.add(keras.layers.Activation("relu"))
model.add(keras.layers.MaxPool2D(pool_size=3, strides=2, padding="SAME"))

prev_filters = 64
for filters in [64] * 3 + [128] * 4 + [256] * 6 + [512] * 3:
    strides = 1 if filters == prev_filters else 2
    model.add(ResidualUnit(filters, strides=strides))
    prev_filters = filters

model.add(keras.layers.GlobalAvgPool2D())
model.add(keras.layers.Flatten())
model.add(keras.layers.Dense(2, activation="softmax"))
model.summary()

该模型经过训练:

history = model.fit(xtrain, ytrain, epochs=10, validation_data=[xtest, ytest])

xtrain的形状为(2000, 224, 224, 3)xtest的形状为(1000, 224, 224, 3)

然后我收到OOM错误消息:

ResourceExhaustedError: OOM when allocating tensor with shape[256,256,3,3] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
     [[{{node residual_unit_28/conv2d_64/Conv2D}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

     [[GroupCrossDeviceControlEdges_0/training/Nadam/Nadam/Const/_287]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
 [Op:__inference_keras_scratch_graph_30479]

该错误是由我的计算机内存(具有16 GB的RAM)还是某些不正确的配置引起的?

1 个答案:

答案 0 :(得分:3)

GPU内存和计算机内存(RAM)不同。当您使用GPU进行训练时,它将需要将图层和输入加载到GPU内存中。您的GPU内存参数太多。我查了一下您的GPU,它只有2 GB的内存,不足以进行任何形式的图像网络培训。如果您想使用GPU进行训练,我建议您减少网络中的单位数量,减小批次大小或总体上使用较小的模型。