我最近一直在尝试TensorFlow的高级API,并得到了一些奇怪的结果:当我使用Keras Model API和TensorFlow Estimator API训练具有相同超参数的看似完全相同的模型时,我得到了不同的结果(使用Keras导致精度提高约4%。
这是我的代码:
import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Dense, Conv2D, MaxPooling2D, GlobalAveragePooling2D, BatchNormalization, Activation, Flatten
from tensorflow.keras.initializers import VarianceScaling
from tensorflow.keras.optimizers import Adam
# Load CIFAR-10 dataset and normalize pixel values
(X_train, y_train), (X_test, y_test) = tf.keras.datasets.cifar10.load_data()
X_train = np.array(X_train, dtype=np.float32)
y_train = np.array(y_train, dtype=np.int32).reshape(-1)
X_test = np.array(X_test, dtype=np.float32)
y_test = np.array(y_test, dtype=np.int32).reshape(-1)
mean = X_train.mean(axis=(0, 1, 2), keepdims=True)
std = X_train.std(axis=(0, 1, 2), keepdims=True)
X_train = (X_train - mean) / std
X_test = (X_test - mean) / std
y_train_one_hot = tf.keras.utils.to_categorical(y_train, num_classes=10)
y_test_one_hot = tf.keras.utils.to_categorical(y_test, num_classes=10)
# Define forward pass for a convolutional neural network.
# This function takes a batch of images as input and returns
# unscaled class scores (aka logits) from the last layer
def conv_net(X):
initializer = VarianceScaling(scale=2.0)
X = Conv2D(filters=32, kernel_size=3, padding='valid', activation='relu', kernel_initializer=initializer)(X)
X = BatchNormalization()(X)
X = Conv2D(filters=64, kernel_size=3, padding='valid', activation='relu', kernel_initializer=initializer)(X)
X = BatchNormalization()(X)
X = MaxPooling2D()(X)
X = Conv2D(filters=64, kernel_size=3, padding='valid', activation='relu', kernel_initializer=initializer)(X)
X = BatchNormalization()(X)
X = Conv2D(filters=128, kernel_size=3, padding='valid', activation='relu', kernel_initializer=initializer)(X)
X = BatchNormalization()(X)
X = Conv2D(filters=256, kernel_size=3, padding='valid', activation='relu', kernel_initializer=initializer)(X)
X = BatchNormalization()(X)
X = GlobalAveragePooling2D()(X)
X = Dense(10)(X)
return X
# For training this model I use Adam optimizer with learning_rate=1e-3
# Train the model for 10 epochs using keras.Model API
def keras_model():
inputs = Input(shape=(32,32,3))
scores = conv_net(inputs)
outputs = Activation('softmax')(scores)
model = Model(inputs=inputs, outputs=outputs)
model.compile(optimizer=Adam(lr=3e-3),
loss='categorical_crossentropy',
metrics=['accuracy'])
return model
model1 = keras_model()
model1.fit(X_train, y_train_one_hot, batch_size=128, epochs=10)
results1 = model1.evaluate(X_test, y_test_one_hot)
print(results1)
# The above usually gives 79-82% accuracy
# Now train the same model for 10 epochs using tf.estimator.Estimator API
train_input_fn = tf.estimator.inputs.numpy_input_fn(x={'X': X_train}, y=y_train, \
batch_size=128, num_epochs=10, shuffle=True)
test_input_fn = tf.estimator.inputs.numpy_input_fn(x={'X': X_test}, y=y_test, \
batch_size=128, num_epochs=1, shuffle=False)
def tf_estimator(features, labels, mode, params):
X = features['X']
scores = conv_net(X)
if mode == tf.estimator.ModeKeys.PREDICT:
return tf.estimator.EstimatorSpec(mode=mode, predictions={'scores': scores})
loss = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=scores, labels=labels)
metrics = {'accuracy': tf.metrics.accuracy(labels=labels, predictions=tf.argmax(scores, axis=-1))}
optimizer = tf.train.AdamOptimizer(learning_rate=params['lr'], epsilon=params['epsilon'])
step = optimizer.minimize(loss, global_step=tf.train.get_global_step())
return tf.estimator.EstimatorSpec(mode=mode, loss=tf.reduce_mean(loss), train_op=step, eval_metric_ops=metrics)
model2 = tf.estimator.Estimator(model_fn=tf_estimator, params={'lr': 3e-3, 'epsilon': tf.keras.backend.epsilon()})
model2.train(input_fn=train_input_fn)
results2 = model2.evaluate(input_fn=test_input_fn)
print(results2)
# This usually gives 75-78% accuracy
print('Keras accuracy:', results1[1])
print('Estimator accuracy:', results2['accuracy'])
我对这两个模型进行了30次训练,每次训练10个时间段:使用Keras训练的模型的平均准确度为0.8035,使用Estimator训练的模型的平均准确度为0.7631(标准偏差分别为0.0065和0.0072)。如果我使用Keras,则精度会更高。我的问题是为什么会这样?我是在做错什么还是错过了一些重要参数?在两种情况下,模型的架构都是相同的,并且我使用的是相同的超参数(尽管我并没有将Adam的epsilon设置为相同的值,尽管它并没有真正影响整体结果),但是准确性却大不相同。
我还使用原始的TensorFlow编写了训练循环,并获得了与Estimator API相同的准确性(低于Keras的准确性)。这让我认为Keras中某些参数的默认值与TensorFlow不同,但实际上它们似乎都是相同的。
我还尝试了其他体系结构,有时我的准确性差异较小,但是我找不到导致差异的任何特定图层类型。看起来如果我使用较浅的网络,则差异通常会变小。但是,并非总是如此。例如,在以下模型中,精度差异甚至更大:
def simple_conv_net(X):
initializer = VarianceScaling(scale=2.0)
X = Conv2D(filters=32, kernel_size=5, strides=2, padding='valid', activation='relu', kernel_initializer=initializer)(X)
X = BatchNormalization()(X)
X = Conv2D(filters=64, kernel_size=3, strides=1, padding='valid', activation='relu', kernel_initializer=initializer)(X)
X = BatchNormalization()(X)
X = Conv2D(filters=64, kernel_size=3, strides=1, padding='valid', activation='relu', kernel_initializer=initializer)(X)
X = BatchNormalization()(X)
X = Flatten()(X)
X = Dense(10)(X)
return X
同样,我使用具有3e-3学习率的Adam优化器将其训练了10次,共30次。 Keras的平均准确度为0.6561,Estimator的平均准确度为0.6101(标准偏差分别为0.0084和0.0111)。是什么导致这种差异?