模型A :
ipt = Input(batch_shape=(32, 240, 4))
x1 = Conv1D(16, 20, strides=200, padding='same')(ipt)
x1 = BatchNormalization()(x1)
x2 = Conv1D(16, 200, strides=120, padding='same')(ipt)
x2 = BatchNormalization()(x2) # ...
模型B :
ipt = Input(batch_shape=(32, 250, 4))
x1 = Conv1D(16, 20, strides=200)(ipt)
x1 = BatchNormalization()(x1)
x2 = Conv1D(16, 200, strides=120)(ipt)
x2 = BatchNormalization()(x2) # ...
weights
无法加载到B上(下面的图像和代码)。
这是一个更大的模型的一小段,每隔X个历元需要更改其timesteps
参数,并且ZeroPadding1D
似乎在使用时更改层构建顺序;这不会影响 model 权重,因为它们是通过字典进行映射的-而优化器权重是从列表到列表的顺序映射的。
在TF1和TF2以及带有keras
和tf.keras
的导入中均可重现。有什么问题,以及如何解决? Relevant Git
环境:Win-10操作系统,CUDA 10.0.130,cuDNN 7.6.0,Python 3.7.4,GTX 1070
观察:
BatchNormalization
-和concatenate
之前的任何层数;优化器权重最终仅在.get_weights()
中被交换strides
而不是batch_shape[1]
MaxPooling1D
一起使用strides > 1
padding='valid'
导致ZeroPadding1D
,但不更改构建顺序(不知道为什么) model_A.summary()
:
Layer (type) Output Shape Param # Connected to
==================================================================================
input_1 (InputLayer) [(32, 240, 4)] 0
__________________________________________________________________________________
conv1d (Conv1D) (32, 2, 16) 1296 input_1[0][0]
__________________________________________________________________________________
conv1d_1 (Conv1D) (32, 2, 16) 12816 input_1[0][0]
__________________________________________________________________________________
bn_1 (BatchNormalization) (32, 2, 16) 64 conv1d[0][0]
__________________________________________________________________________________
bn_2 (BatchNormalization) (32, 2, 16) 64 conv1d_1[0][0]
__________________________________________________________________________________
concatenate (Concatenate) (32, 2, 32) 0 bn_1[0][0]
bn_2[0][0]
__________________________________________________________________________________
gap_0 (GlobalAveragePooling1D) (32, 32) 0 concatenate[0][0]
__________________________________________________________________________________
dense (Dense) (32, 1) 33 gap_0[0][0]
model_B.summary()
(请注意交换的图层)
input_2 (InputLayer) [(32, 250, 4)] 0
_____________________________________________________________________________________
conv1d_2 (Conv1D) (32, 2, 16) 1296 input_2[0][0]
_____________________________________________________________________________________
bn_1 (BatchNormalization) (32, 2, 16) 64 conv1d_2[0][0]
_____________________________________________________________________________________
conv1d_3 (Conv1D) (32, 3, 16) 12816 input_2[0][0]
_____________________________________________________________________________________
zero_padding1d (ZeroPadding1D) (32, 3, 16) 0 bn_1[0][0]
_____________________________________________________________________________________
bn_2 (BatchNormalization) (32, 3, 16) 64 conv1d_3[0][0]
_____________________________________________________________________________________
concatenate_1 (Concatenate) (32, 3, 32) 0 zero_padding1d[0][0]
bn_2[0][0]
_____________________________________________________________________________________
gap_0 (GlobalAveragePooling1D) (32, 32) 0 concatenate_1[0][0]
_____________________________________________________________________________________
dense_1 (Dense) (32, 1) 33 gap_0[0][0]
最少可复制的代码:
# also works with `from keras`
from tensorflow.keras.layers import Input, Conv1D, ZeroPadding1D, concatenate
from tensorflow.keras.layers import BatchNormalization, Dense, GlobalAveragePooling1D
from tensorflow.keras.models import Model
import numpy as np
def make_model(batch_shape):
ipt = Input(batch_shape=batch_shape)
x1 = Conv1D(16, 20, strides=200, padding='same')(ipt)
x1 = BatchNormalization()(x1)
x2 = Conv1D(16, 200, strides=120, padding='same')(ipt)
x2 = BatchNormalization()(x2)
x1, x2 = zero_pad(x1, x2)
preout = concatenate([x1, x2])
preout = GlobalAveragePooling1D()(preout)
out = Dense(1)(preout)
model = Model(ipt, out)
model.compile('adam', 'mse')
return model
def zero_pad(x1, x2):
diff = int(x2.shape[1]) - int(x1.shape[1])
if diff > 0:
x1 = ZeroPadding1D((diff, 0))(x1)
elif diff < 0:
x2 = ZeroPadding1D((abs(diff), 0))(x2)
return x1, x2
def make_data(batch_shape):
return (np.random.randn(*batch_shape),
np.random.randint(0, 2, (batch_shape[0], 1)))
batch_shape_A = (32, 240, 4)
batch_shape_B = (32, 250, 4)
batch_shape_C = (32, 240, 4)
model_A = make_model(batch_shape_A)
model_B = make_model(batch_shape_B)
model_C = make_model(batch_shape_C) # 'control group'
x_A, y_A = make_data(batch_shape_A)
x_B, y_B = make_data(batch_shape_B)
x_C, y_C = make_data(batch_shape_C)
model_A.train_on_batch(x_A, y_A)
model_B.train_on_batch(x_B, y_B)
model_C.train_on_batch(x_C, y_C)
optimizer_weights_A = model_A.optimizer.get_weights()
model_C.optimizer.set_weights(optimizer_weights_A)
print("model_C optimizer weights set successfully")
model_B.optimizer.set_weights(optimizer_weights_A)
print("model_B optimizer weights set successfully") # will not print
输出:
model_C optimizer weights set successfully
ValueError: Optimizer weight shape (16,) not compatible with provided
weight shape (200, 4, 16)
答案 0 :(得分:0)
找到了解决方法和一种解释形式;它不是关于ZeroPadding1D
,而是关于一个“分支”中有一个附加层,而不是另一个-正如plot_model()
所揭示的那样;见下文。
Keras似乎是通过垂直遍历来构建图层的-请注意,编号的图层图完全符合.summary()
的顺序。顺序更改可能仍会在“分支”的末尾发生-我想推理是,两个分支的层节点在合并到公共层之前应处于相同的深度。 但是,这不是故事的全部-请参阅底部的免责声明。
解决方法:在每个分支中插入一个“ pseudolayer”以使层数相等;我会坚持使用z-padding:
def zero_pad(x1, x2):
diff = int(x2.shape[1]) - int(x1.shape[1])
if diff > 0:
x1 = ZeroPadding1D((diff, 0))(x1)
x2 = ZeroPadding1D((0, 0))(x2)
elif diff < 0:
x2 = ZeroPadding1D((abs(diff), 0))(x2)
x1 = ZeroPadding1D((0, 0))(x1)
return x1, x2
运行问题中的代码:
model_C optimizer weights set successfully
model_B optimizer weights set successfully # SUCCESS
模型图:通过from tensorflow.keras.utils import plot_model; plot_model(model_A) ...
说明免责声明:我尚未在确切的源代码行中对此进行确认,.summary()
并不总是与plot_model()
保持一致;例如,使用padding='valid'
,我们在上方得到model_B
和model_A
的{{1}}图,但摘要显示了model_B
的构建顺序。另外,model_A
无需修复即可工作,因为两个模型最终都使用padding='valid'
,因此层结构(表面上)是相同的。