我有一个包含> 16k矢量(21维)的数据集。
我将80%用于训练,将20%用于测试。
我使用上述数据集实现了神经网络和朴素贝叶斯。
获取数据集并拆分
data_set = np.loadtxt("./data/_vector21.csv", delimiter=",")
inp_vec = data_set[:, 1:22]
out_vec = data_set[:, 22:]
# Split dataset into training set and test set
from sklearn.cross_validation import train_test_split
X_train, X_test, y_train, y_test = train_test_split(inp_vec, out_vec, test_size=0.2) # 80% training and 20% test
神经网络
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaler.fit(X_train)
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)
from sklearn.neural_network import MLPClassifier
mlp = MLPClassifier(hidden_layer_sizes=(21, 100, 100, 6), max_iter=1000)
mlp.fit(X_train, y_train)
predictions = mlp.predict(X_test)
print("\nAccuracy: %.2f%%\n" % (accuracy_score(y_test, predictions)*100))
# Accuracy: 61.26%
朴素贝叶斯
# Create a Gaussian Classifier
from sklearn.naive_bayes import GaussianNB
model = GaussianNB()
# Train the model using the training sets
model.fit(X_train, y_train)
# Predict the response for test dataset
y_pred = model.predict(X_test)
print("\nAccuracy: %.3f%%" % (metrics.accuracy_score(y_test, y_pred)*100))
#Accuracy: 34.050%
我希望神经网络模型和朴素贝叶斯模型的输出更接近。
谁能告诉我我做错了什么以及如何解决?