我正在编写一个函数来估计可以放入GPU的最大批处理大小。为了测试我的解决方案,我编写了一个函数,该函数从核心Keras层生成许多随机计算图并编译Keras模型。然后,我将模型传递给此函数:
# Text input
text_inputs = Input(shape=(max_words,))
text_model = Sequential()
text_model.add(Dense(128, activation='relu'))
text_model.add(Activation('softmax'))
text_1 = text_model(text_inputs)
# Image model
base_model = applications.ResNet50(weights='imagenet', include_top=False)
vision_model = Sequential()
vision_model.add(Conv2D(32, (3, 3), activation='relu', padding='same', input_shape=(256, 256, 3)))
vision_model.add(MaxPooling2D((2, 2)))
vision_model.add(Dropout(0.75))
vision_model.add(Flatten())
# Image and Text model
full_model = Model(inputs=[base_model.input, text_inputs], outputs=[predictions])
full_model.compile(loss='categorical_crossentropy',
optimizer='adam',
metrics=['accuracy'])
我添加了比例因子以获得保守的估计,并避免在自动超参数优化期间出现“内存不足”(OOM)错误。不知何故,即使缩放比例为4.5,我仍然可以获得OOM。这是示例模型摘要
from itertools import chain
from math import log, floor
import keras.backend as K
import operator as op
from functools import reduce
from keras.models import Model
def estimate_batch_size(model: Model, available_mem: int,
scale_by: float = 5.0,
precision: int = 2) -> int:
"""
:param model: keras Model
:param available_mem: available memory in bytes
:param scale_by: scaling factor
:param precision: float precision: 2 bytes for fp16, 4 - for fp32, etc.
:return: closest 2^n to the estimated batch size
"""
num_params = sum(chain.from_iterable((
(reduce(op.mul, l.output_shape[1:]) for l in model.layers),
(K.count_params(x) for x in model.trainable_weights),
(K.count_params(x) for x in model.non_trainable_weights)
)))
max_size = int(available_mem / (precision * num_params * scale_by))
return int(2 ** floor(log(max_size, 2)))
给定8GB的VRAM和______________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_252 (InputLayer) (None, 50, 45, 1) 0
_________________________________________________________________
dense_696 (Dense) (None, 50, 45, 34) 68
_________________________________________________________________
dense_699 (Dense) (None, 50, 45, 279) 9765
=================================================================
Total params: 9,833
Trainable params: 9,833
Non-trainable params: 0
_________________________________________________________________
None
,该函数将返回1024的批量大小(716333 fp32参数,包括中间占位符,乘以4.5)。但是,尽管缩放因子很大,但我仍然可以使用OOM。我知道这种方法没有考虑到分配给梯度计算的占位符,但是我仍然感到困惑,即使是4.5的比例因子也无法得出安全的估计。有可能获得更准确的估计吗?