Keras:加载多个GPU生成的检查点权重HDF5

时间:2016-12-27 09:05:45

标签: tensorflow hdf5 keras

Checkpoint代码段:

checkpointer = ModelCheckpoint(filepath=os.path.join(savedir, "mid/weights.{epoch:02d}.hd5"), monitor='val_loss', verbose=1, save_best_only=False, save_weights_only=False)
hist = model.fit_generator(
    gen.generate(batch_size = batch_size, nb_classes=nb_classes), samples_per_epoch=593920, nb_epoch=nb_epoch, verbose=1, callbacks=[checkpointer], validation_data = gen.vld_generate(VLD_PATH, batch_size = 64, nb_classes=nb_classes), nb_val_samples=10000
)

我在多GPU主机上训练我的模型,该主机以HDF5格式转储mid个文件。当我将它们加载到keras.load_weights('mid')的单个GPU机器上时,出现了错误:

Using TensorFlow backend.
Traceback (most recent call last):
  File "server.py", line 171, in <module>
    model = load_model_and_weights('zhch.yml', '7_weights.52.hd5')
  File "server.py", line 16, in load_model_and_weights
    model.load_weights(os.path.join('model', weights_name))
  File "/home/lz/code/ProjectGo/meta/project/libpolicy-server/.virtualenv/lib/python3.5/site-packages/keras/engine/topology.py", line 2701, in load_weights
    self.load_weights_from_hdf5_group(f)
  File "/home/lz/code/ProjectGo/meta/project/libpolicy-server/.virtualenv/lib/python3.5/site-packages/keras/engine/topology.py", line 2753, in load_weights_from_hdf5_group
    str(len(flattened_layers)) + ' layers.')
ValueError: You are trying to load a weight file containing 1 layers into a model with 21 layers.

有没有办法在单台GPU机器上加载由多个GPU生成的检查点权重?似乎没有问题的Keras讨论过这个问题,因此任何帮助都会受到赞赏。

2 个答案:

答案 0 :(得分:3)

您可以在单个GPU上加载模型,如下所示:

from keras.models import load_model

multi_gpus_model = load_model('mid')
origin_model = multi_gpus_model.layers[-2]  # you can use multi_gpus_model.summary() to see the layer of the original model
origin_model.save_weights('single_gpu_model.hdf5')

'single_gpu_model.hdf5'是可以加载到单GPU机器模型的文件。

答案 1 :(得分:0)

尝试此功能:

def keras_model_reassign_weights(model_cpu,model_gpu):
    weights_temp ={}
    print('_'*5,'Collecting weights from GPU model','_'*5)
    for layer in model_gpu.layers:
        try:
            for layer_unw in layer.layers:
                #print('Weights extracted for: ',layer_unw.name)
                weights_temp[layer_unw.name] = layer_unw.get_weights()
            break
        except:
            print('Skipped: ',layer.name)
    print('_'*5,'Writing weights to CPU model','_'*5)
    for layer in model_cpu.layers:
        try:
            layer.set_weights(weights_temp[layer.name])
            #print(layer.name,'Done!')
        except:
            print(layer.name,'weights does not set for this layer!')
    return model_cpu

但是您需要首先将权重加载到您的gpu模型中:

#load or initialize your keras multi-gpu model
model_gpu = None 
#load or initialize your keras model with the same structure, without using keras.multi_gpu function
model_cpu = None 
#load weights into multigpu model
model_gpu.load_weights(r'gpu_model_best_checkpoint.hdf5') 
#execute function
model_cpu = keras_model_reassign_weights(model_cpu,model_gpu)
#save obtained weights for cpu model
model_cpu.save_weights(r'CPU_model.hdf5')

传输后,您可以将权重用于单个GPU或CPU模型。