使用Keras训练后第一层的权重不变

时间:2019-04-26 21:07:32

标签: tensorflow keras training-data

以前有人讨论过这个问题,但是他们通常会收敛到逐渐消失的根源。

但是在我的模型中,只有两个隐藏层,它们不太可能卡在梯度消失中,如下所示:

from __future__ import print_function

import keras
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Dropout
from keras.optimizers import RMSprop

batch_size = 128
num_classes = 10
epochs = 20

(x_train, y_train), (x_test, y_test) = mnist.load_data()

x_train = x_train.reshape(60000, 784)
x_test = x_test.reshape(10000, 784)
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255

y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

model = Sequential()
model.add(Dense(512, activation='relu', kernel_initializer='random_uniform',input_shape=(784,)))
model.add(Dropout(0.2))
model.add(Dense(512, kernel_initializer='random_uniform',activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(num_classes, kernel_initializer='random_uniform',activation='softmax'))


print (model.get_weights().__len__())
for i in range(6):
    print (str(i), "th layer shape: ", model.get_weights()[i].shape ,model.get_weights()[i].__len__(), "mean: ", np.mean(model.get_weights()[i]), "std dev: ", np.std(model.get_weights()[i]))
    print ("Before Training")
    print (model.get_weights()[i][0])


class LossHistory(keras.callbacks.Callback):
    def on_train_begin(self, logs={}):
        self.losses = []

    def on_batch_end(self, batch, logs={}):
        self.losses.append(logs.get('loss'))

batch_history = LossHistory()

model.summary()

model.compile(loss='categorical_crossentropy',
              optimizer=RMSprop(),
              metrics=['accuracy'])

history = model.fit(x_train, y_train,
                    batch_size=batch_size,
                    epochs=epochs,
                    verbose=1,
                    validation_data=(x_test, y_test),
                    callbacks = [batch_history])

for i in range(6):
    print (str(i), "th layer shape: ", model.get_weights()[i].shape ,model.get_weights()[i].__len__(), "mean: ", np.mean(model.get_weights()[i]), "std dev: ", np.std(model.get_weights()[i]))
    print ("After Training Training")
    print (model.get_weights()[i][0])

我拍摄了训练前后的重量屏幕截图。总之,训练后第一层的权重不会改变,但是第二层的权重却会改变。 (由于参数众多,我只显示了权重矩阵第一行的一部分)

第一层: 训练前 enter image description here

第一层:培训后 enter image description here

第二层:培训之前 enter image description here 第二层:培训后 enter image description here

1 个答案:

答案 0 :(得分:0)

经过一些调试后,我意识到,即使(784,512)矩阵的第一行(在屏幕截图中)看起来从未改变,权重矩阵在训练后确实发生了变化。

原因是我使用的是预处理的mnist数据,这是一个手写数字图像数据集,只有那些带有墨水的部分才会使用某些RGB值进行编码。换句话说,这些边缘区域全为“ 0”。例如,图像2d矩阵的第一行始终为0。因此,在第一个隐藏层的权重矩阵内,将始终使用(dJ / da_1)*(da_1 / dw_i1)更新512个权重向量中每个权重向量的第一项,而(da_1 / dw_i1)= x_1即为'如上所述,在所有训练样本中均为0'。因此它永远不会更新。