我正在尝试建立一个神经网络来对有毒蘑菇进行分类,但结果并不正确。该模型成功编译,然而有人可以提供直觉,为什么训练结果在仅仅几个时期之后看起来如此准确。这似乎不正确,是在数据预处理中出错了吗?
数据集可在此处找到:https://archive.ics.uci.edu/ml/machine-learning-databases/mushroom/agaricus-lepiota.data
以下是代码:
import keras.utils
from sklearn import preprocessing
from sklearn.model_selection import train_test_split
import numpy as np
import pandas as pd
# seed weights
np.random.seed(3)
# import dataset
data = pd.read_csv('agaricus-lepiota.csv', delimiter=',')
# encode labels as integers so the can be one-hot-encoded which takes int matrix
le = preprocessing.LabelEncoder()
data = data.apply(le.fit_transform)
# one-hot-encode string data (now type int)
ohe = preprocessing.OneHotEncoder(sparse=False)
data = ohe.fit_transform(data)
X = data[:, 1:23]
Y = data[:, 0:1]
# split into test and train set
x_train, x_test, y_train, y_test = train_test_split(X, Y, test_size=.2, random_state=5)
# create model
model = Sequential()
model.add(Dense(500, input_dim=22, activation='relu'))
model.add(Dense(300, activation='relu'))
model.add(Dense(100, activation='relu'))
model.add(Dense(50, activation='relu'))
model.add(Dense(25, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(x_train, y_train, validation_data=(x_test, y_test), epochs=1000, batch_size=25)
答案 0 :(得分:0)
一个时期是很多次迭代(n = training_set_size / batch_size)。考虑到你有这么多层,没有正规化我会怀疑过度拟合。