我正在使用HDF5Matrix加载数据集并用它训练我的模型。在第一个时代,我获得了大约10%的准确度。 目前,我的数据集不是很大,所以我可以将HDF5Matrix的内容复制到一个numpy阵列并用它训练。我重新初始化模型,这一次,在第一个时代,我获得了40%的准确度。
有关HDF5Matrix的更多信息,请参阅this example。
我理解在fit方法中,参数shuffle必须是False或' batch'。无论如何,我都有同样的行为。
有人有同样的问题吗?你能否告诉我,我做错了什么?
这是代码的片段:
使用HDF5Matrix
from keras.utils.io_utils import HDF5Matrix
x_train = HDF5Matrix('../data/default_data.h5', 'data')
y_train = HDF5Matrix('../data/default_data.h5', 'labels') # create the model ...
# train the model
model.fit(x_train, y_train, epochs=200, batch_size=2048, shuffle='batch') # which outputs:
Epoch 1/200
1758510/1758510 [==============================] - 42s - loss: 2.5574 - categorical_accuracy: 0.1032
Epoch 2/200
1758510/1758510 [==============================] - 41s - loss: 2.3145 - categorical_accuracy: 0.1553
Epoch 3/200
1758510/1758510 [==============================] - 41s - loss: 2.1931 - categorical_accuracy: 0.2067
Epoch 4/200
694272/1758510 [==========>...................] - ETA: 24s - loss: 2.1055 - categorical_accuracy: 0.2328
使用numpy数组
# create the model again
...
# copy the HDF5Matrix to a numpy array
X_training = x_train[0:1758510]
Y_training = y_train[0:1758510]
# check X_training is equal to x_train
...
# train the model again
model.fit(X_training,
Y_training,
epochs=200,
batch_size=256,
shuffle=True)
# which outputs
Epoch 1/200
1758510/1758510 [==============================] - 27s - loss: 1.5019 - categorical_accuracy: 0.4710
Epoch 2/200
89600/1758510 [>.............................] - ETA: 26s - loss: 1.2786 - categorical_accuracy: 0.5523
非常感谢