我写了这个非常简单的代码
model = keras.models.Sequential()
model.add(layers.Dense(13000, input_dim=X_train.shape[1], activation='relu', trainable=False))
model.add(layers.Dense(1, input_dim=13000, activation='linear'))
model.compile(loss="binary_crossentropy", optimizer='adam', metrics=["accuracy"])
model.fit(X_train, y_train, batch_size=X_train.shape[0], epochs=1000000, verbose=1)
数据为MNIST,但仅适用于数字“ 0”和“ 1”。 我遇到了一个非常奇怪的问题,即损耗正像预期的那样单调减少到零,而准确度却没有提高,而是降低了。
这是示例输出
12665/12665 [==============================] - 0s 11us/step - loss: 0.0107 - accuracy: 0.2355
Epoch 181/1000000
12665/12665 [==============================] - 0s 11us/step - loss: 0.0114 - accuracy: 0.2568
Epoch 182/1000000
12665/12665 [==============================] - 0s 11us/step - loss: 0.0128 - accuracy: 0.2726
Epoch 183/1000000
12665/12665 [==============================] - 0s 11us/step - loss: 0.0133 - accuracy: 0.2839
Epoch 184/1000000
12665/12665 [==============================] - 0s 11us/step - loss: 0.0134 - accuracy: 0.2887
Epoch 185/1000000
12665/12665 [==============================] - 0s 11us/step - loss: 0.0110 - accuracy: 0.2842
Epoch 186/1000000
12665/12665 [==============================] - 0s 11us/step - loss: 0.0101 - accuracy: 0.2722
Epoch 187/1000000
12665/12665 [==============================] - 0s 11us/step - loss: 0.0094 - accuracy: 0.2583
由于我们只有两类,因此最低准确性的基准应该是0.5,此外,我们正在监视训练集的准确性,因此它应该非常高地达到100%,我期望过度拟合,而我根据损失函数。
在最后一个时期,就是这种情况
12665/12665 [==============================] - 0s 11us/step - loss: 9.9710e-06 - accuracy: 0.0758
如果您随机猜出的最坏理论可能性是50%,则7%的准确性。这不是偶然的。发生了什么事。
有人可以看到问题吗?
完整代码
from tensorflow import keras
import numpy as np
from matplotlib import pyplot as plt
import keras
from keras.callbacks import Callback
from keras import layers
import warnings
class EarlyStoppingByLossVal(Callback):
def __init__(self, monitor='val_loss', value=0.00001, verbose=0):
super(Callback, self).__init__()
self.monitor = monitor
self.value = value
self.verbose = verbose
def on_epoch_end(self, epoch, logs={}):
current = logs.get(self.monitor)
if current is None:
warnings.warn("Early stopping requires %s available!" % self.monitor, RuntimeWarning)
if current < self.value:
if self.verbose > 0:
print("Epoch %05d: early stopping THR" % epoch)
self.model.stop_training = True
def load_mnist():
mnist = keras.datasets.mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()
train_images = np.reshape(train_images, (train_images.shape[0], train_images.shape[1] * train_images.shape[2]))
test_images = np.reshape(test_images, (test_images.shape[0], test_images.shape[1] * test_images.shape[2]))
train_labels = np.reshape(train_labels, (train_labels.shape[0],))
test_labels = np.reshape(test_labels, (test_labels.shape[0],))
train_images = train_images[(train_labels == 0) | (train_labels == 1)]
test_images = test_images[(test_labels == 0) | (test_labels == 1)]
train_labels = train_labels[(train_labels == 0) | (train_labels == 1)]
test_labels = test_labels[(test_labels == 0) | (test_labels == 1)]
train_images, test_images = train_images / 255, test_images / 255
return train_images, train_labels, test_images, test_labels
X_train, y_train, X_test, y_test = load_mnist()
train_acc = []
train_errors = []
test_acc = []
test_errors = []
width_list = [13000]
for width in width_list:
print(width)
model = keras.models.Sequential()
model.add(layers.Dense(width, input_dim=X_train.shape[1], activation='relu', trainable=False))
model.add(layers.Dense(1, input_dim=width, activation='linear'))
model.compile(loss="binary_crossentropy", optimizer='adam', metrics=["accuracy"])
callbacks = [EarlyStoppingByLossVal(monitor='loss', value=0.00001, verbose=1)]
model.fit(X_train, y_train, batch_size=X_train.shape[0], epochs=1000000, verbose=1, callbacks=callbacks)
train_errors.append(model.evaluate(X_train, y_train)[0])
test_errors.append(model.evaluate(X_test, y_test)[0])
train_acc.append(model.evaluate(X_train, y_train)[1])
test_acc.append(model.evaluate(X_test, y_test)[1])
plt.plot(width_list, train_errors, marker='D')
plt.xlabel("width")
plt.ylabel("train loss")
plt.show()
plt.plot(width_list, test_errors, marker='D')
plt.xlabel("width")
plt.ylabel("test loss")
plt.show()
plt.plot(width_list, train_acc, marker='D')
plt.xlabel("width")
plt.ylabel("train acc")
plt.show()
plt.plot(width_list, test_acc, marker='D')
plt.xlabel("width")
plt.ylabel("test acc")
plt.show()
答案 0 :(得分:1)
(二进制)分类问题在最后一层的线性激活是毫无意义的;将最后一层更改为:
model.add(layers.Dense(1, input_dim=width, activation='sigmoid'))
最后一层的线性激活用于回归问题,而不用于分类问题。