MLP对Keras和scikit-learn的结果完全不同

时间:2018-07-23 15:27:42

标签: tensorflow scikit-learn keras

在MNIST上运行单个隐藏层MLP,对于Keras和sklearn,我得到的结果截然不同。

import numpy as np
np.random.seed(5)
import os
os.environ["CUDA_VISIBLE_DEVICES"] = '-1'
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense
from keras import regularizers
from keras.optimizers import Adam
from keras.utils import np_utils
from sklearn.neural_network import MLPClassifier

(x_train, y_train), (x_test, y_test) = mnist.load_data()

num_classes = 10
batch_data = x_train[:2000]
batch_labels = y_train[:2000]

# flat 2d images
batch_data_flat = batch_data.reshape(2000, 784)

# one-hot encoding
batch_labels_one_hot = np_utils.to_categorical(batch_labels, num_classes)

num_hidden_nodes = 100
alpha = 0.0001
batch_size = 128
beta_1 = 0.9
beta_2 = 0.999
epsilon = 1e-08
learning_rate_init = 0.001
epochs = 200

# keras
keras_model = Sequential()
keras_model.add(Dense(num_hidden_nodes, activation='relu',
                      kernel_regularizer=regularizers.l2(alpha),
                      kernel_initializer='glorot_uniform',
                      bias_initializer='glorot_uniform'))
keras_model.add(Dense(num_classes, activation='softmax',
                      kernel_regularizer=regularizers.l2(alpha),
                      kernel_initializer='glorot_uniform',
                      bias_initializer='glorot_uniform'))

keras_optim = Adam(lr=learning_rate_init, beta_1=beta_1, beta_2=beta_2, epsilon=epsilon)
keras_model.compile(optimizer=keras_optim, loss='categorical_crossentropy', metrics=['accuracy'])

keras_model.fit(batch_data_flat, batch_labels_one_hot, batch_size=batch_size, epochs=epochs, verbose=0)

# sklearn
sklearn_model = MLPClassifier(hidden_layer_sizes=(num_hidden_nodes,), activation='relu', solver='adam',
                              alpha=alpha, batch_size=batch_size, learning_rate_init=learning_rate_init,
                              max_iter=epochs, beta_1=beta_1, beta_2=beta_2, epsilon=epsilon)

sklearn_model.fit(batch_data_flat, batch_labels_one_hot)

# evaluate both on their training data
score_keras = keras_model.evaluate(batch_data_flat, batch_labels_one_hot)
score_sklearn = sklearn_model.score(batch_data_flat, batch_labels_one_hot)
print("Acc: keras %f, sklearn %f" % (score_keras[1], score_sklearn))

输出:Acc: keras 0.182500, sklearn 1.000000

我看到的唯一区别是scikit-learn为Keras的最后一层sqrt(2 / (fan_in + fan_out))sqrt(6 / (fan_in + fan_out))的Glorot初始化进行计算。但是我认为那不会造成如此不同。我在这里忘记了什么吗?

scikit-learn 0.19.1,Keras 2.2.0(后端Tensorflow 1.9.0)

1 个答案:

答案 0 :(得分:0)

您可能应该使用“零”而不是“ glorot_uniform”来初始化偏差。