假设我有一个这样的模型(这是时间序列预测的模型):
ipt = Input((data.shape[1] ,data.shape[2])) # 1
x = Conv1D(filters = 10, kernel_size = 3, padding = 'causal', activation = 'relu')(ipt) # 2
x = LSTM(15, return_sequences = False)(x) # 3
x = BatchNormalization()(x) # 4
out = Dense(1, activation = 'relu')(x) # 5
现在,我想向该网络添加批处理规范化层。考虑到batch normalization doesn't work with LSTM这一事实,我可以在Conv1D
层之前添加它吗?我认为在LSTM
之后有一个批处理规范化层是合理的。
此外,我可以在此网络的哪里添加Dropout?一样的地方(在批量归一化之前还是之后?)
AveragePooling1D
和Conv1D
之间添加LSTM
?在这种情况下,是否可以在Conv1D
和AveragePooling1D
之间添加批处理规范化而对LSTM
层没有任何影响?答案 0 :(得分:3)
BatchNormalization
可以与LSTM一起使用-链接的SO提供了错误的建议;实际上,在我的EEG分类应用中,它占主导地位LayerNormalization
。现在您的情况:
Conv1D
之前添加它” ?请勿-相反,请事先对数据进行标准化,否则您将使用劣等版本来做同样的事情BatchNormalization
和之后都尝试-适用于Conv1D
和LSTM
BN
之后的LSTM
可能会对引入噪声的能力产生反作用,这会使分类器层产生混淆-但这大约是输出之前的一层,不是LSTM
LSTM
前面使用return_sequences=True
的堆叠return_sequences=False
,则可以将Dropout
放在任何地方-LSTM
之前,之后或两者都< / li>
recurrent_dropout
,Dropout
仍然比LSTM
更好-但是,您可以两者都做;只是不要与activation='relu'
配合使用,LSTM
对于每个错误来说都是不稳定的Pooling
都是多余的,可能会损害性能;与简单的平均运算相比,通过非线性方法可以更好地转换稀缺数据SqueezeExcite
块;这是一种自我关注的形式-参见paper;我对以下1D的实现activation='selu'
和AlphaDropout
初始化'lecun_normal'
下面是一个示例模板,您可以将其用作起点;我还建议您阅读以下SO,以供进一步阅读:Regularizing RNNs和Visualizing RNN gradients
from keras.layers import Input, Dense, LSTM, Conv1D, Activation
from keras.layers import AlphaDropout, BatchNormalization
from keras.layers import GlobalAveragePooling1D, Reshape, multiply
from keras.models import Model
import keras.backend as K
import numpy as np
def make_model(batch_shape):
ipt = Input(batch_shape=batch_shape)
x = ConvBlock(ipt)
x = LSTM(16, return_sequences=False, recurrent_dropout=0.2)(x)
# x = BatchNormalization()(x) # may or may not work well
out = Dense(1, activation='relu')
model = Model(ipt, out)
model.compile('nadam', 'mse')
return model
def make_data(batch_shape): # toy data
return (np.random.randn(*batch_shape),
np.random.uniform(0, 2, (batch_shape[0], 1)))
batch_shape = (32, 21, 20)
model = make_model(batch_shape)
x, y = make_data(batch_shape)
model.train_on_batch(x, y)
使用的功能:
def ConvBlock(_input): # cleaner code
x = Conv1D(filters=10, kernel_size=3, padding='causal', use_bias=False,
kernel_initializer='lecun_normal')(_input)
x = BatchNormalization(scale=False)(x)
x = Activation('selu')(x)
x = AlphaDropout(0.1)(x)
out = SqueezeExcite(x)
return out
def SqueezeExcite(_input, r=4): # r == "reduction factor"; see paper
filters = K.int_shape(_input)[-1]
se = GlobalAveragePooling1D()(_input)
se = Reshape((1, filters))(se)
se = Dense(filters//r, activation='relu', use_bias=False,
kernel_initializer='he_normal')(se)
se = Dense(filters, activation='sigmoid', use_bias=False,
kernel_initializer='he_normal')(se)
return multiply([_input, se])
空间辍学:将noise_shape = (batch_size, 1, channels)
传递给Dropout
-具有以下效果;有关代码,请参见Git gist: