在模型上加载保存的权重时为什么会有问题

时间:2019-01-21 13:48:37

标签: tensorflow keras google-colaboratory

我正在尝试使用许多工具(辍学,自动编码器等)修改分类器模型,以分析获得最佳结果的方法。因此,我正在使用save_weightsload_weights方法。

我第一次启动模型时,它运行良好。但是,在加载权重时,fit并没有做任何事情。在整个训练过程中,损失停滞不前。

我知道我一定做错了什么,但我不知道怎么办。 我首先想到的是梯度消失的问题,因为我首先遇到了自动编码数据集的问题。但是经过多次调整和尝试,我觉得问题出在重量加载上。亲自看看(这显然是在运行时重新启动之后):

# Classifier

model = Sequential()

model.add(Dense(50, activation= 'relu', input_dim= x.shape[1]))
model.add(Dense(50, activation= 'relu'))
model.add(Dense(50, activation= 'relu'))
model.add(Dense(50, activation= 'relu'))
model.add(Dense(10, activation= 'softmax'))

model.compile(optimizer='adam', loss = 'categorical_crossentropy', metrics = ['acc'])

model.save_weights('/content/drive/My Drive/Colab Notebooks/Weights/KagTPOneStart')

第一次拟合(加载十个初始重量。是的,我知道此时已经存在初始重量,但是这次我离开了队伍以证明它没有问题):

model.load_weights('/content/drive/My Drive/Colab Notebooks/Weights/KagTPOneStart')

model.fit(x,y_train,epochs=10,batch_size=20, validation_split=0.15)

model.save_weights('/content/drive/My Drive/Colab Notebooks/Weights/KagTPOneNormal')

结果:

Train on 35700 samples, validate on 6300 samples
Epoch 1/10
35700/35700 [==============================] - 5s 128us/step - loss: 1.0875 - acc: 0.8036 - val_loss: 0.3275 - val_acc: 0.9067
Epoch 2/10
35700/35700 [==============================] - 4s 120us/step - loss: 0.2792 - acc: 0.9201 - val_loss: 0.3186 - val_acc: 0.9079
Epoch 3/10
35700/35700 [==============================] - 4s 122us/step - loss: 0.2255 - acc: 0.9357 - val_loss: 0.1918 - val_acc: 0.9444
Epoch 4/10
35700/35700 [==============================] - 4s 121us/step - loss: 0.1777 - acc: 0.9499 - val_loss: 0.1977 - val_acc: 0.9465
Epoch 5/10
35700/35700 [==============================] - 4s 121us/step - loss: 0.1530 - acc: 0.9549 - val_loss: 0.1718 - val_acc: 0.9478
Epoch 6/10
35700/35700 [==============================] - 4s 121us/step - loss: 0.1402 - acc: 0.9595 - val_loss: 0.1847 - val_acc: 0.9510
Epoch 7/10
35700/35700 [==============================] - 4s 122us/step - loss: 0.1236 - acc: 0.9637 - val_loss: 0.1675 - val_acc: 0.9546
Epoch 8/10
35700/35700 [==============================] - 4s 121us/step - loss: 0.1160 - acc: 0.9660 - val_loss: 0.1776 - val_acc: 0.9586
Epoch 9/10
35700/35700 [==============================] - 4s 120us/step - loss: 0.1109 - acc: 0.9683 - val_loss: 0.1928 - val_acc: 0.9492
Epoch 10/10
35700/35700 [==============================] - 4s 120us/step - loss: 0.1040 - acc: 0.9701 - val_loss: 0.1749 - val_acc: 0.9570
WARNING:tensorflow:This model was compiled with a Keras optimizer (<tensorflow.python.keras.optimizers.Adam object at 0x7fb76ca35080>) but is being saved in TensorFlow format with `save_weights`. The model's weights will be saved, but unlike with TensorFlow optimizers in the TensorFlow format the optimizer's state will not be saved.

Consider using a TensorFlow optimizer from `tf.train`.

第二次训练(先加载初始权重然后适合):

model.load_weights('/content/drive/My Drive/Colab Notebooks/Weights/KagTPOneStart')

model.fit(x,y_train,epochs=10,batch_size=20, validation_split=0.15)

model.save_weights('/content/drive/My Drive/Colab Notebooks/Weights/KagTPOneNormal')

结果:

Train on 35700 samples, validate on 6300 samples
Epoch 1/10
35700/35700 [==============================] - 4s 121us/step - loss: 14.4847 - acc: 0.1011 - val_loss: 14.5907 - val_acc: 0.0948
Epoch 2/10
35700/35700 [==============================] - 4s 122us/step - loss: 14.5018 - acc: 0.1003 - val_loss: 14.5907 - val_acc: 0.0948
Epoch 3/10
35700/35700 [==============================] - 4s 120us/step - loss: 14.5018 - acc: 0.1003 - val_loss: 14.5907 - val_acc: 0.0948
Epoch 4/10
35700/35700 [==============================] - 4s 121us/step - loss: 14.5018 - acc: 0.1003 - val_loss: 14.5907 - val_acc: 0.0948
Epoch 5/10
35700/35700 [==============================] - 4s 121us/step - loss: 14.5018 - acc: 0.1003 - val_loss: 14.5907 - val_acc: 0.0948
Epoch 6/10
35700/35700 [==============================] - 4s 121us/step - loss: 14.5018 - acc: 0.1003 - val_loss: 14.5907 - val_acc: 0.0948
Epoch 7/10
35700/35700 [==============================] - 4s 122us/step - loss: 14.5018 - acc: 0.1003 - val_loss: 14.5907 - val_acc: 0.0948
Epoch 8/10
35700/35700 [==============================] - 4s 121us/step - loss: 14.5018 - acc: 0.1003 - val_loss: 14.5907 - val_acc: 0.0948
Epoch 9/10
35700/35700 [==============================] - 4s 122us/step - loss: 14.5018 - acc: 0.1003 - val_loss: 14.5907 - val_acc: 0.0948
Epoch 10/10
35700/35700 [==============================] - 5s 130us/step - loss: 14.5018 - acc: 0.1003 - val_loss: 14.5907 - val_acc: 0.0948
WARNING:tensorflow:This model was compiled with a Keras optimizer (<tensorflow.python.keras.optimizers.Adam object at 0x7fb76ca35080>) but is being saved in TensorFlow format with `save_weights`. The model's weights will be saved, but unlike with TensorFlow optimizers in the TensorFlow format the optimizer's state will not be saved.

Consider using a TensorFlow optimizer from `tf.train`.

在此先感谢您的帮助:)

PS:这是供参考的数据,但是我真的不认为这是问题所在。这是Google在kaggle上提供的类似MNIST的数据集。 (我相信这只是MNIST,但并非所有示例):

import pandas as pd 
df=pd.read_csv('/content/drive/My Drive/Colab Notebooks/IA/Kaggle TP1/train.csv')
data = df.values
data.shape        #(42000, 785)
y = data[:,0]
y_train =  np_utils.to_categorical(y, 10)
x = data[:,1:]

1 个答案:

答案 0 :(得分:1)

要重新启动fit()函数已使用的模型训练,必须重新编译它。

model.compile(optimizer='adam', loss = 'categorical_crossentropy', metrics = ['acc'])

其原因是为模型分配了优化器,该优化器已经处于某种状态。此状态指示训练进度,因此,如果不重新编译模型,训练将在此状态下继续。如果您的模型在第一次训练中确实被卡住,几乎可以肯定会继续被卡住(学习率太低等)。

Compile定义了损失函数,优化器和指标,与分配给各层的权重无关。