Question

我有一个Keras模型，它是在8个gpu上训练过的。这意味着模型具有以下块：with tf.device('gpu:0')。现在我想用另一台有4 gpus的电脑学习转学习。但是，这会导致错误，很可能是因为模型是在更多gpus（error: could not set cudnn tensor descriptor: CUDNN_STATUS_BAD_PARAM）上训练的。在错误日志中，我还可以看到一个警告，即tensorflow正试图在设备GPU 0-7上分配渐变。有没有办法在训练有素的模型中调整或清除设备，这些模型配置了Keras？

仅供参考：我没有元图文件，因为模型也是用Keras保存的，而不是使用张量流保护功能

当前尝试次数

我尝试更改图层属性，但这并没有使它工作：

track = 0
for i in range(len(model.layers)):
    if model.layers[i].name[:6] == 'lambda':
        model.layers[i].arguments['n_gpus'] = n_gpus
        if model.layers[i].arguments['part'] > n_gpus-1:
            model.layers[i].arguments['part'] = np.arange(n_gpus)[track]
            track += 1
            if track > n_gpus-1:
                track = 0

此外，我尝试设置可见设备的数量，这也不起作用：

import os
os.environ['CUDA_VISIBLE_DEVICES'] = "0,1,2,3"

创建超过8 gpus的模型的脚本

"""
to_multi_gpu & slice_batch by: https://github.com/fchollet/keras/issues/2436
baseline_model by: http://machinelearningmastery.com/
"""
from keras import backend as K
from keras.models import Sequential, Model
from keras.layers import Dense, Input, Lambda, merge
import tensorflow as tf

def slice_batch(x, n_gpus, part):
    """
    Divide the input batch into [n_gpus] slices, and obtain slice no. [part]
    i.e. if len(x)=10, then slice_batch(x, 2, 1) will return x[5:].
    x: input batch (input shape of model)
    n_gpus: number of gpus
    part: id of current gpu

    return: sliced model per gpu
    """
    sh = K.shape(x)
    L = sh[0] // n_gpus
    if part == n_gpus - 1:
        return x[part*L:]
    return x[part*L:(part+1)*L]

def to_multi_gpu(model, n_gpus):
    """
    Given a keras [model], return an equivalent model which parallelizes
    the computation over [n_gpus] GPUs.
    Each GPU gets a slice of the input batch, applies the model on that slice
    and later the outputs of the models are concatenated to a single
    tensor, hence the user sees a model that behaves the same as the original.

    model: sequential model created with the Keras library
    n_gpus: number of gpus

    return: model divided over n_gpus
    """
    # Only divide model over multiple gpus if there is more than one
    if n_gpus > 1:
        with tf.device('/cpu:0'):
            x = Input(model.input_shape[1:])#, name=model.input_names[0]

        towers = []
        # Divide model over gpus
        for g in range(n_gpus):
            # Work on GPU number g.
            with tf.device('/gpu:' + str(g)):
                # Obtain the g-th slice of the batch.
                slice_g = Lambda(slice_batch, lambda shape: shape,
                                 arguments={'n_gpus':n_gpus, 'part':g})(x)
                # Apply model on the batch slice.
                towers.append(model(slice_g))
        # Merge multi-gpu outputs with cpu
        with tf.device('/cpu:0'):
            merged = merge(towers, mode='concat', concat_axis=0)

        return Model(input=[x], output=merged)
    else:
        return model

def baseline_model(num_pixels, num_classes, n_gpus):
    # create model
    model = Sequential()
    model.add(Dense(num_pixels, input_dim=num_pixels, init='normal', activation='relu'))
    model.add(Dense(num_classes, init='normal', activation='softmax'))

    model = to_multi_gpu(model, n_gpus)
    # Compile model
    model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
    return model

if __name__ == '__main__':
    model = baseline_model(784, 9, 8)

Answer 1

使用下面的设置解决了它。但是，现在模型在cpu而不是gpu上运行。由于我在最后一层微调这个模型，这不是一个大问题。但是如果你想重新加载和训练整个模型，这个答案可能不会令人满意。

重要设置为os.environ['CUDA_VISIBLE_DEVICES'] = ""和allow_soft_placement=True。

第一个屏蔽所有gpu，第二个屏蔽Tensorflow自动在可用设备上分配模型（在本例中为CPU）。

示例代码

import os
os.environ['CUDA_VISIBLE_DEVICES'] = ""
import tensorflow as tf 
from keras.models import load_model
from keras import backend as K

if __name__ == '__main__':
    model = load_model('baseline_model.h5')
    init = tf.global_variables_initializer()
    gpu_options = tf.GPUOptions(allow_growth=True)
    # Add ops to save and restore all the variables.
    with tf.Session(config=tf.ConfigProto(gpu_options=gpu_options, allow_soft_placement=True,\
                                        log_device_placement=True)) as sess:
        K.set_session(sess)
        sess.run(init)
        tf.train.start_queue_runners(sess=sess)
        # Call model.fit here
        sess.close()

在经过训练和重新加载的Keras模型中更改设备分配

1 个答案: