Question

模型A ：

ipt = Input(batch_shape=(32, 240, 4))
x1  = Conv1D(16, 20,  strides=200, padding='same')(ipt)
x1  = BatchNormalization()(x1)
x2  = Conv1D(16, 200, strides=120, padding='same')(ipt)
x2  = BatchNormalization()(x2) # ...

模型B ：

ipt = Input(batch_shape=(32, 250, 4))
x1  = Conv1D(16, 20,  strides=200)(ipt)
x1  = BatchNormalization()(x1)
x2  = Conv1D(16, 200, strides=120)(ipt)
x2  = BatchNormalization()(x2) # ...

两者具有相同的权重形状-但是，由于B具有不同的构建顺序，因此A的优化器weights 无法加载到B上（下面的图像和代码）。

这是一个更大的模型的一小段，每隔X个历元需要更改其timesteps参数，并且ZeroPadding1D似乎在使用时更改层构建顺序；这不会影响 model 权重，因为它们是通过字典进行映射的-而优化器权重是从列表到列表的顺序映射的。

在TF1和TF2以及带有keras和tf.keras的导入中均可重现。有什么问题，以及如何解决？ Relevant Git

环境：Win-10操作系统，CUDA 10.0.130，cuDNN 7.6.0，Python 3.7.4，GTX 1070

观察：

交换任何其他层，而不仅仅是BatchNormalization-和concatenate之前的任何层数；优化器权重最终仅在.get_weights()中被交换
可以更改strides而不是batch_shape[1]
可以与MaxPooling1D一起使用strides > 1
padding='valid'导致ZeroPadding1D，但不更改构建顺序（不知道为什么）

model_A.summary() ：

Layer (type)                    Output Shape         Param #     Connected to     
==================================================================================
input_1 (InputLayer)            [(32, 240, 4)]       0                            
__________________________________________________________________________________
conv1d (Conv1D)                 (32, 2, 16)          1296        input_1[0][0]    
__________________________________________________________________________________
conv1d_1 (Conv1D)               (32, 2, 16)          12816       input_1[0][0]    
__________________________________________________________________________________
bn_1 (BatchNormalization)       (32, 2, 16)          64          conv1d[0][0]     
__________________________________________________________________________________
bn_2 (BatchNormalization)       (32, 2, 16)          64          conv1d_1[0][0]   
__________________________________________________________________________________
concatenate (Concatenate)       (32, 2, 32)          0           bn_1[0][0]       
                                                                 bn_2[0][0]       
__________________________________________________________________________________
gap_0 (GlobalAveragePooling1D)  (32, 32)             0           concatenate[0][0]
__________________________________________________________________________________
dense (Dense)                   (32, 1)              33          gap_0[0][0]

model_B.summary() （请注意交换的图层）

input_2 (InputLayer)            [(32, 250, 4)]       0                               
_____________________________________________________________________________________
conv1d_2 (Conv1D)               (32, 2, 16)          1296        input_2[0][0]       
_____________________________________________________________________________________
bn_1 (BatchNormalization)       (32, 2, 16)          64          conv1d_2[0][0]      
_____________________________________________________________________________________
conv1d_3 (Conv1D)               (32, 3, 16)          12816       input_2[0][0]       
_____________________________________________________________________________________
zero_padding1d (ZeroPadding1D)  (32, 3, 16)          0           bn_1[0][0]          
_____________________________________________________________________________________
bn_2 (BatchNormalization)       (32, 3, 16)          64          conv1d_3[0][0]      
_____________________________________________________________________________________
concatenate_1 (Concatenate)     (32, 3, 32)          0           zero_padding1d[0][0]
                                                                 bn_2[0][0]          
_____________________________________________________________________________________
gap_0 (GlobalAveragePooling1D)  (32, 32)             0           concatenate_1[0][0] 
_____________________________________________________________________________________
dense_1 (Dense)                 (32, 1)              33          gap_0[0][0]

最少可复制的代码：

# also works with `from keras`
from tensorflow.keras.layers import Input, Conv1D, ZeroPadding1D, concatenate
from tensorflow.keras.layers import BatchNormalization, Dense, GlobalAveragePooling1D
from tensorflow.keras.models import Model
import numpy as np

def make_model(batch_shape):
    ipt = Input(batch_shape=batch_shape)

    x1  = Conv1D(16, 20,  strides=200, padding='same')(ipt)
    x1  = BatchNormalization()(x1)
    x2  = Conv1D(16, 200, strides=120, padding='same')(ipt)
    x2  = BatchNormalization()(x2)

    x1, x2 = zero_pad(x1, x2)
    preout = concatenate([x1, x2])
    preout = GlobalAveragePooling1D()(preout)
    out    = Dense(1)(preout)

    model  = Model(ipt, out)
    model.compile('adam', 'mse')
    return model 

def zero_pad(x1, x2):
    diff = int(x2.shape[1]) - int(x1.shape[1])
    if   diff > 0:
        x1 = ZeroPadding1D((diff, 0))(x1)
    elif diff < 0:
        x2 = ZeroPadding1D((abs(diff), 0))(x2)
    return x1, x2

def make_data(batch_shape):
    return (np.random.randn(*batch_shape), 
            np.random.randint(0, 2, (batch_shape[0], 1)))

batch_shape_A = (32, 240, 4)
batch_shape_B = (32, 250, 4)
batch_shape_C = (32, 240, 4)
model_A  = make_model(batch_shape_A)
model_B  = make_model(batch_shape_B)
model_C  = make_model(batch_shape_C) # 'control group'
x_A, y_A = make_data(batch_shape_A)
x_B, y_B = make_data(batch_shape_B)
x_C, y_C = make_data(batch_shape_C)

model_A.train_on_batch(x_A, y_A)
model_B.train_on_batch(x_B, y_B)
model_C.train_on_batch(x_C, y_C)

optimizer_weights_A = model_A.optimizer.get_weights()

model_C.optimizer.set_weights(optimizer_weights_A)
print("model_C optimizer weights set successfully")

model_B.optimizer.set_weights(optimizer_weights_A)
print("model_B optimizer weights set successfully") # will not print

输出：

model_C optimizer weights set successfully

ValueError: Optimizer weight shape (16,) not compatible with provided 
weight shape (200, 4, 16)

Answer 1

找到了解决方法和一种解释形式；它不是关于ZeroPadding1D，而是关于一个“分支”中有一个附加层，而不是另一个-正如plot_model()所揭示的那样；见下文。

Keras似乎是通过垂直遍历来构建图层的-请注意，编号的图层图完全符合.summary()的顺序。顺序更改可能仍会在“分支”的末尾发生-我想推理是，两个分支的层节点在合并到公共层之前应处于相同的深度。但是，这不是故事的全部-请参阅底部的免责声明。

解决方法：在每个分支中插入一个“ pseudolayer”以使层数相等；我会坚持使用z-padding：

def zero_pad(x1, x2):
    diff = int(x2.shape[1]) - int(x1.shape[1])
    if   diff > 0:
        x1 = ZeroPadding1D((diff, 0))(x1)
        x2 = ZeroPadding1D((0, 0))(x2)
    elif diff < 0:
        x2 = ZeroPadding1D((abs(diff), 0))(x2)
        x1 = ZeroPadding1D((0, 0))(x1)
    return x1, x2

运行问题中的代码：

model_C optimizer weights set successfully
model_B optimizer weights set successfully  # SUCCESS

模型图：通过from tensorflow.keras.utils import plot_model; plot_model(model_A) ...

说明免责声明：我尚未在确切的源代码行中对此进行确认，.summary()并不总是与plot_model()保持一致；例如，使用padding='valid'，我们在上方得到model_B和model_A的{{1}}图，但摘要显示了model_B的构建顺序。另外，model_A无需修复即可工作，因为两个模型最终都使用padding='valid'，因此层结构（表面上）是相同的。

添加没有参数的图层后无法加载优化器权重

1 个答案: