添加没有参数的图层后无法加载优化器权重

时间:2019-11-12 02:49:37

标签: python tensorflow keras tensorflow2.0

模型A

ipt = Input(batch_shape=(32, 240, 4))
x1  = Conv1D(16, 20,  strides=200, padding='same')(ipt)
x1  = BatchNormalization()(x1)
x2  = Conv1D(16, 200, strides=120, padding='same')(ipt)
x2  = BatchNormalization()(x2) # ...

模型B

ipt = Input(batch_shape=(32, 250, 4))
x1  = Conv1D(16, 20,  strides=200)(ipt)
x1  = BatchNormalization()(x1)
x2  = Conv1D(16, 200, strides=120)(ipt)
x2  = BatchNormalization()(x2) # ...


两者具有相同的权重形状-但是,由于B具有不同的构建顺序,因此A的优化器weights 无法加载到B上(下面的图像和代码)。

这是一个更大的模型的一小段,每隔X个历元需要更改其timesteps参数,并且ZeroPadding1D似乎在使用时更改层构建顺序;这不会影响 model 权重,因为它们是通过字典进行映射的-而优化器权重是从列表到列表的顺序映射的。

在TF1和TF2以及带有kerastf.keras的导入中均可重现。有什么问题,以及如何解决? Relevant Git


环境:Win-10操作系统,CUDA 10.0.130,cuDNN 7.6.0,Python 3.7.4,GTX 1070

观察

  • 交换任何其他层,而不仅仅是BatchNormalization-和concatenate之前的任何层数;优化器权重最终仅在.get_weights()中被交换
  • 可以更改strides而不是batch_shape[1]
  • 可以与MaxPooling1D一起使用strides > 1
  • padding='valid'导致ZeroPadding1D,但更改构建顺序(不知道为什么)

model_A.summary()

Layer (type)                    Output Shape         Param #     Connected to     
==================================================================================
input_1 (InputLayer)            [(32, 240, 4)]       0                            
__________________________________________________________________________________
conv1d (Conv1D)                 (32, 2, 16)          1296        input_1[0][0]    
__________________________________________________________________________________
conv1d_1 (Conv1D)               (32, 2, 16)          12816       input_1[0][0]    
__________________________________________________________________________________
bn_1 (BatchNormalization)       (32, 2, 16)          64          conv1d[0][0]     
__________________________________________________________________________________
bn_2 (BatchNormalization)       (32, 2, 16)          64          conv1d_1[0][0]   
__________________________________________________________________________________
concatenate (Concatenate)       (32, 2, 32)          0           bn_1[0][0]       
                                                                 bn_2[0][0]       
__________________________________________________________________________________
gap_0 (GlobalAveragePooling1D)  (32, 32)             0           concatenate[0][0]
__________________________________________________________________________________
dense (Dense)                   (32, 1)              33          gap_0[0][0]      

model_B.summary() (请注意交换的图层)

input_2 (InputLayer)            [(32, 250, 4)]       0                               
_____________________________________________________________________________________
conv1d_2 (Conv1D)               (32, 2, 16)          1296        input_2[0][0]       
_____________________________________________________________________________________
bn_1 (BatchNormalization)       (32, 2, 16)          64          conv1d_2[0][0]      
_____________________________________________________________________________________
conv1d_3 (Conv1D)               (32, 3, 16)          12816       input_2[0][0]       
_____________________________________________________________________________________
zero_padding1d (ZeroPadding1D)  (32, 3, 16)          0           bn_1[0][0]          
_____________________________________________________________________________________
bn_2 (BatchNormalization)       (32, 3, 16)          64          conv1d_3[0][0]      
_____________________________________________________________________________________
concatenate_1 (Concatenate)     (32, 3, 32)          0           zero_padding1d[0][0]
                                                                 bn_2[0][0]          
_____________________________________________________________________________________
gap_0 (GlobalAveragePooling1D)  (32, 32)             0           concatenate_1[0][0] 
_____________________________________________________________________________________
dense_1 (Dense)                 (32, 1)              33          gap_0[0][0]  

最少可复制的代码

# also works with `from keras`
from tensorflow.keras.layers import Input, Conv1D, ZeroPadding1D, concatenate
from tensorflow.keras.layers import BatchNormalization, Dense, GlobalAveragePooling1D
from tensorflow.keras.models import Model
import numpy as np

def make_model(batch_shape):
    ipt = Input(batch_shape=batch_shape)

    x1  = Conv1D(16, 20,  strides=200, padding='same')(ipt)
    x1  = BatchNormalization()(x1)
    x2  = Conv1D(16, 200, strides=120, padding='same')(ipt)
    x2  = BatchNormalization()(x2)

    x1, x2 = zero_pad(x1, x2)
    preout = concatenate([x1, x2])
    preout = GlobalAveragePooling1D()(preout)
    out    = Dense(1)(preout)

    model  = Model(ipt, out)
    model.compile('adam', 'mse')
    return model 

def zero_pad(x1, x2):
    diff = int(x2.shape[1]) - int(x1.shape[1])
    if   diff > 0:
        x1 = ZeroPadding1D((diff, 0))(x1)
    elif diff < 0:
        x2 = ZeroPadding1D((abs(diff), 0))(x2)
    return x1, x2

def make_data(batch_shape):
    return (np.random.randn(*batch_shape), 
            np.random.randint(0, 2, (batch_shape[0], 1)))

batch_shape_A = (32, 240, 4)
batch_shape_B = (32, 250, 4)
batch_shape_C = (32, 240, 4)
model_A  = make_model(batch_shape_A)
model_B  = make_model(batch_shape_B)
model_C  = make_model(batch_shape_C) # 'control group'
x_A, y_A = make_data(batch_shape_A)
x_B, y_B = make_data(batch_shape_B)
x_C, y_C = make_data(batch_shape_C)

model_A.train_on_batch(x_A, y_A)
model_B.train_on_batch(x_B, y_B)
model_C.train_on_batch(x_C, y_C)

optimizer_weights_A = model_A.optimizer.get_weights()

model_C.optimizer.set_weights(optimizer_weights_A)
print("model_C optimizer weights set successfully")

model_B.optimizer.set_weights(optimizer_weights_A)
print("model_B optimizer weights set successfully") # will not print

输出

model_C optimizer weights set successfully

ValueError: Optimizer weight shape (16,) not compatible with provided 
weight shape (200, 4, 16)

1 个答案:

答案 0 :(得分:0)

找到了解决方法和一种解释形式;它不是关于ZeroPadding1D,而是关于一个“分支”中有一个附加层,而不是另一个-正如plot_model()所揭示的那样;见下文。

Keras似乎是通过垂直遍历来构建图层的-请注意,编号的图层图完全符合.summary()的顺序。顺序更改可能仍会在“分支”的末尾发生-我想推理是,两个分支的层节点在合并到公共层之前应处于相同的深度。 但是,这不是故事的全部-请参阅底部的免责声明。

解决方法:在每个分支中插入一个“ pseudolayer”以使层数相等;我会坚持使用z-padding:

def zero_pad(x1, x2):
    diff = int(x2.shape[1]) - int(x1.shape[1])
    if   diff > 0:
        x1 = ZeroPadding1D((diff, 0))(x1)
        x2 = ZeroPadding1D((0, 0))(x2)
    elif diff < 0:
        x2 = ZeroPadding1D((abs(diff), 0))(x2)
        x1 = ZeroPadding1D((0, 0))(x1)
    return x1, x2

运行问题中的代码:

model_C optimizer weights set successfully
model_B optimizer weights set successfully  # SUCCESS

模型图:通过from tensorflow.keras.utils import plot_model; plot_model(model_A) ...

enter image description here


说明免责声明:我尚未在确切的源代码行中对此进行确认,.summary()并不总是与plot_model()保持一致;例如,使用padding='valid',我们在上方得到model_Bmodel_A的{​​{1}}图,但摘要显示了model_B的构建顺序。另外,model_A无需修复即可工作,因为两个模型最终都使用padding='valid',因此层结构(表面上)是相同的。