我目前正在培训一些自定义模型,这些模型最多需要大约12Gb GPU内存。我的设置大约有96Gb的GPU内存,而python / Jupyter仍然设法将所有gpu内存占用了一点,直到出现资源耗尽错误。我在这个特殊的问题上停留了一段时间,因此将不胜感激。
现在,在加载类似于以下内容的基于vgg的模型时:
from keras.applications.vgg16 import VGG16
from keras.models import Model
import keras
from keras.models import Model, Sequential
from keras.models import Input
input_shape = (512, 512, 3)
base_model = VGG16(input_shape=input_shape, weights=None, include_top=False)
pixel_branch = base_model.output
pixel_branch = Flatten()(pixel_branch)
new_model = Model(inputs=base_model.input, outputs=pixel_branch)
text_branch = Sequential()
text_branch.add(Dense(32, input_shape=(1,), activation='relu'))
# merged = Merge([new_model, text_branch], mode='concat')
merged = keras.layers.concatenate([new_model.output, text_branch.output])
age = Dense(1000, activation='relu')(merged)
age = Dense(1000, activation='relu')(age)
age = Dense(1)(age)
# show model
# model.summary()
model = Model(inputs=[base_model.input, text_branch.input], outputs=age)
当我仅使用此代码运行jupyter单元并使用nvidia-smi监视GPU使用情况时,该值为0%。 但是,我用以下代码替换了上面的Jupyter单元中的代码:
from keras.applications.inception_v3 import InceptionV3
from keras.models import Model
import keras
from keras.models import Model
from keras.models import Sequential
from keras.models import Input
input_shape = (512, 512, 3)
base_model = InceptionV3(input_shape=input_shape, weights=None, include_top=False)
pixel_branch = base_model.output
pixel_branch = Flatten()(pixel_branch)
new_model = Model(inputs=base_model.input, outputs=pixel_branch)
text_branch = Sequential()
text_branch.add(Dense(32, input_shape=(1,), activation='relu'))
# merged = Merge([new_model, text_branch], mode='concat')
merged = keras.layers.concatenate([new_model.output, text_branch.output])
age = Dense(1000, activation='relu')(merged)
age = Dense(1000, activation='relu')(age)
age = Dense(1)(age)
# show model
# model.summary()
model = Model(inputs=[base_model.input, text_branch.input], outputs=age)
GPU的使用变得疯狂,甚至在我在Keras中进行model.compile()或model.fit()之前,所有GPU中的几乎所有内存都突然耗尽了!
我也在Tensorflow中尝试了allow_growth和per_process_gpu_memory_fraction。使用基于Inception的模型时,我在运行model.fit时仍然遇到资源耗尽错误。 请注意,我不认为这是GPU内存错误,因为使用8个Tesla K80实例使用大约96GB的GPU内存。
也请注意,我的批处理大小为2。