同一模型的两种不同实现的巨大差异

时间:2018-05-07 15:19:32

标签: tensorflow deep-learning keras

我使用两种不同的方式来实现相同类型的模型,

方法1

loss_function = 'mean_squared_error'
optimizer = 'Adagrad'
batch_size = 256
nr_of_epochs = 80

model= Sequential()
model.add(Conv1D(60,32, strides=1, activation='relu',padding='causal',input_shape=(64,1)))
model.add(Conv1D(80,10, strides=1, activation='relu',padding='causal'))
model.add(Conv1D(100,5, strides=1, activation='relu',padding='causal'))
model.add(MaxPooling1D(2))
model.add(Dense(300,activation='relu'))
model.add(Flatten())
model.add(Dense(1,activation='relu'))
print(model.summary())

model.compile(loss=loss_function, optimizer=optimizer,metrics=['mse','mae'])
history=model.fit(X_train, Y_train, batch_size=batch_size, validation_data=(X_val,Y_val), shuffle=True, epochs=nr_of_epochs,verbose=2)  

方法2

inputs = Input(shape=(64,1))
outX = Conv1D(60, 32, strides=1, activation='relu',padding='causal')(inputs)
outX = Conv1D(80, 10, activation='relu',padding='causal')(outX)
outX = Conv1D(100, 5, activation='relu',padding='causal')(outX)
outX = MaxPooling1D(2)(outX)
outX = Dense(300, activation='relu')(outX)
outX = Flatten()(outX)
predictions = Dense(1,activation='linear')(outX)
model = Model(inputs=[inputs],outputs=predictions)
print(model.summary())

model.compile(loss=loss_function, optimizer=optimizer,metrics=['mse','mae'])
history=model.fit(X_train, Y_train, batch_size=batch_size, validation_data=(X_val,Y_val), shuffle=True,epochs=nr_of_epochs,verbose=2)   

两种方法的模型架构应该相同,请参阅以下图片

方法1

enter image description here

方法2

enter image description here

即使他们的架构也是一样的,但是当我将它们提供给完全相同的数据集时,训练过程就大不相同了。在第一个实现中,损失函数在一个时期之后停止减少;而第二次实施有合理的训练损失趋同。为什么它有这么大的差异?

方法1

625s - loss: 0.0670 - mean_squared_error: 0.0670 - mean_absolute_error: 0.0647 - val_loss: 0.0653 - val_mean_squared_error: 0.0653 - val_mean_absolute_error: 0.0646                                                                                                                                  
Epoch 2/120                                                                                                                                        
624s - loss: 0.0647 - mean_squared_error: 0.0647 - mean_absolute_error: 0.0641 - val_loss: 0.0653 - val_mean_squared_error: 0.0653 - val_mean_absolute_error: 0.0646                                                                                                                                  
Epoch 3/120                                                                                                                                        
624s - loss: 0.0647 - mean_squared_error: 0.0647 - mean_absolute_error: 0.0641 - val_loss: 0.0653 - val_mean_squared_error: 0.0653 - val_mean_absolute_error: 0.0646                                                                                                                                  
Epoch 4/120                                                                                                                                        
625s - loss: 0.0647 - mean_squared_error: 0.0647 - mean_absolute_error: 0.0641 - val_loss: 0.0653 - val_mean_squared_error: 0.0653 - val_mean_absolute_error: 0.0646                                                                                                                                  
Epoch 5/120                                                                                                                                        
624s - loss: 0.0647 - mean_squared_error: 0.0647 - mean_absolute_error: 0.0641 - val_loss: 0.0653 - val_mean_squared_error: 0.0653 - val_mean_absolute_error: 0.0646                                                                                                                                  
Epoch 6/120                                                                                                                                        
622s - loss: 0.0647 - mean_squared_error: 0.0647 - mean_absolute_error: 0.0641 - val_loss: 0.0653 - val_mean_squared_error: 0.0653 - val_mean_absolute_error: 0.0646                       

方法2

429s - loss: 0.0623 - mean_squared_error: 0.0623 - mean_absolute_error: 0.1013 - val_loss: 0.0505 - val_mean_squared_error: 0.0505 - val_mean_absolute_error: 0.1006                                                                                                                                  
Epoch 2/80                                                                                                                                         
429s - loss: 0.0507 - mean_squared_error: 0.0507 - mean_absolute_error: 0.0977 - val_loss: 0.0504 - val_mean_squared_error: 0.0504 - val_mean_absolute_error: 0.0988                                                                                                                                  
Epoch 3/80                                                                                                                                         
429s - loss: 0.0503 - mean_squared_error: 0.0503 - mean_absolute_error: 0.0964 - val_loss: 0.0498 - val_mean_squared_error: 0.0498 - val_mean_absolute_error: 0.0954                                                                                                                                  
Epoch 4/80                                                                                                                                         
428s - loss: 0.0501 - mean_squared_error: 0.0501 - mean_absolute_error: 0.0955 - val_loss: 0.0498 - val_mean_squared_error: 0.0498 - val_mean_absolute_error: 0.0962                                                                                                                                  
Epoch 5/80                                                                                                                                         
429s - loss: 0.0499 - mean_squared_error: 0.0499 - mean_absolute_error: 0.0951 - val_loss: 0.0501 - val_mean_squared_error: 0.0501 - val_mean_absolute_error: 0.0960                                                                                                                                  
Epoch 6/80                                                                                                                                         
430s - loss: 0.0498 - mean_squared_error: 0.0498 - mean_absolute_error: 0.0947 - val_loss: 0.0495 - val_mean_squared_error: 0.0495 - val_mean_absolute_error: 0.0941           

1 个答案:

答案 0 :(得分:2)

最后一层的激活不同:'relu' x 'linear'

仅此一项就会产生非常不同的结果。 (relu将永远不会产生负面结果)。

此外,还有很多“运气”,特别是在整个模型中使用“relu”时。

每个模型中的权重都是随机初始化的,因此它们不是“相同”(除非您使用model.get_weights()model.set_weights()强制权重从一个到另一个。并且“relu”是必须小心使用的激活。学习率太大可能会迅速将所有结果设置为零,在模型真正学到任何东西之前停止学习。

这是二元分类模型吗?如果是这样,请在最后一层使用“sigmoid”。