分配GPU内存的问题

Question

我正在尝试在python 3中创建机器学习。但后来我尝试编译我的代码，却在Cuda 10.0 / cuDNN 7.5.0中遇到此错误，有人可以帮我吗？

RTX 2080

我正在：凯拉斯（2.2.4） tf-nightly-gpu（1.14.1.dev20190510）

无法创建cudnn句柄：CUDNN_STATUS_INTERNAL_ERROR

代码错误： tensorflow.python.framework.errors_impl.UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.

这是我的代码：

model = Sequential()
model.add(Conv2D(32, (3, 3), activation='relu', input_shape=(50, 50, 1)))
model.add(MaxPooling2D((2, 2)))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D((2, 2)))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(Flatten())
model.add(Dense(1, activation='softmax'))

model.summary()

model.compile(optimizer='adam',
              loss='binary_crossentropy',
              metrics=['accuracy'])
model.fit(x, y, epochs=1, batch_size=n_batch)

OOM在分配具有形状[24946,32,48,48]并键入float的张量时在/ job：localhost / replica：0 / task：0 / device：GPU：0上通过分配器GPU_0_bfc

Answer 1

有2种可能的解决方案。

分配GPU内存的问题

添加以下代码

import tensorflow as tf
gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.5)
config = tf.ConfigProto(gpu_options=gpu_options)
config.gpu_options.allow_growth = True
session = tf.Session(config=config)

还要检查此issue

您的NVIDIA驱动程序问题

如there所述，您需要使用ODE驱动程序升级NVIDIA驱动程序。

请检查NVIDIA Documentation的驱动程序版本

Answer 2

使用Tensorflow 2.0，CUDA 10.0和CUDNN 7.5可以为我工作：

gpus = tf.config.experimental.list_physical_devices('GPU')
tf.config.experimental.set_memory_growth(gpus[0], True)

还有其他一些答案（例如venergiac的答案）使用过时的Tensorflow 1.x语法。如果您使用的是最新的tensorflow，则需要使用我在此处提供的代码。

如果出现以下错误：

Physical devices cannot be modified after being initialized

然后将gpus = tf.config ...行直接导入张量流的位置（即

）解决，即可解决问题。

import tensorflow as tf
gpus = tf.config.experimental.list_physical_devices('GPU')
tf.config.experimental.set_memory_growth(gpus[0], True)

Answer 3

如果您使用的是Tensorflow 2.0，Roko的答案应该会起作用。

如果您想将确切的内存量设置为限制（例如1024MB或2GB等），还有另一种方法来限制GPU的内存使用。

使用此代码：

gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
  try:
    tf.config.experimental.set_virtual_device_configuration(gpus[0], [tf.config.experimental.VirtualDeviceConfiguration(memory_limit=1024)])
  except RuntimeError as e:
    print(e)

此代码会将您的第一个GPU的内存使用量限制为1024MB。只需根据需要更改gpus和memory_limit的索引。

无法创建cudnn句柄：CUDNN STATUS INTERNAL ERROR

3 个答案:

分配GPU内存的问题

您的NVIDIA驱动程序问题