Question

我目前正在培训一些自定义模型，这些模型最多需要大约12Gb GPU内存。我的设置大约有96Gb的GPU内存，而python / Jupyter仍然设法将所有gpu内存占用了一点，直到出现资源耗尽错误。我在这个特殊的问题上停留了一段时间，因此将不胜感激。

现在，在加载类似于以下内容的基于vgg的模型时：

from keras.applications.vgg16 import VGG16
from keras.models import Model

import keras

from keras.models import Model, Sequential
from keras.models import Input
input_shape = (512, 512, 3)
base_model = VGG16(input_shape=input_shape, weights=None, include_top=False)

pixel_branch = base_model.output
pixel_branch = Flatten()(pixel_branch)

new_model = Model(inputs=base_model.input, outputs=pixel_branch)

text_branch = Sequential()
text_branch.add(Dense(32, input_shape=(1,), activation='relu'))

# merged = Merge([new_model, text_branch], mode='concat')
merged = keras.layers.concatenate([new_model.output, text_branch.output])

age = Dense(1000, activation='relu')(merged)
age = Dense(1000, activation='relu')(age)
age = Dense(1)(age)

# show model
# model.summary()
model = Model(inputs=[base_model.input, text_branch.input], outputs=age)

当我仅使用此代码运行jupyter单元并使用nvidia-smi监视GPU使用情况时，该值为0％。但是，我用以下代码替换了上面的Jupyter单元中的代码：

from keras.applications.inception_v3 import InceptionV3
from keras.models import Model
import keras
from keras.models import Model
from keras.models import Sequential

from keras.models import Input
input_shape = (512, 512, 3)
base_model = InceptionV3(input_shape=input_shape, weights=None, include_top=False)

pixel_branch = base_model.output
pixel_branch = Flatten()(pixel_branch)

new_model = Model(inputs=base_model.input, outputs=pixel_branch)

text_branch = Sequential()
text_branch.add(Dense(32, input_shape=(1,), activation='relu'))

# merged = Merge([new_model, text_branch], mode='concat')
merged = keras.layers.concatenate([new_model.output, text_branch.output])

age = Dense(1000, activation='relu')(merged)
age = Dense(1000, activation='relu')(age)
age = Dense(1)(age)

# show model
# model.summary()
model = Model(inputs=[base_model.input, text_branch.input], outputs=age)

GPU的使用变得疯狂，甚至在我在Keras中进行model.compile（）或model.fit（）之前，所有GPU中的几乎所有内存都突然耗尽了！

我也在Tensorflow中尝试了allow_growth和per_process_gpu_memory_fraction。使用基于Inception的模型时，我在运行model.fit时仍然遇到资源耗尽错误。请注意，我不认为这是GPU内存错误，因为使用8个Tesla K80实例使用大约96GB的GPU内存。

也请注意，我的批处理大小为2。

使用带有Tensorflow后端的Keras训练基于Inception V3的模型

0 个答案: