我正在比较Keras和Random Forest。我跟随研究论文,它给keras模型提供了比随机森林模型更高的准确性,但是当我实现它时却没有给我。 RF *的精度和标准差* 0.997 0.0006 Keras的精度0.0079
#Importing dataset
dataset = pd.read_csv('KDD_Dataset.csv')
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, 41:42].values
from sklearn.preprocessing import LabelEncoder
labelencoder_X = LabelEncoder()
X[:,0] = labelencoder_X.fit_transform(X[:,0])
X[:,1] = labelencoder_X.fit_transform(X[:,1])
X[:,2] = labelencoder_X.fit_transform(X[:,2])
#
from sklearn.preprocessing import OneHotEncoder
onehotencoder_0 = OneHotEncoder(categorical_features=[0])
onehotencoder_1 = OneHotEncoder(categorical_features=[1])
onehotencoder_2 = OneHotEncoder(categorical_features=[2])
X = onehotencoder_0.fit_transform(X).toarray()
X = onehotencoder_1.fit_transform(X).toarray()
X = onehotencoder_2.fit_transform(X).toarray()
Encoding categorical data y
from sklearn.preprocessing import LabelEncoder
labelencoder_y = LabelEncoder()
y = labelencoder_y.fit_transform(y)
max(y)
将数据集分为训练集和测试集
#from sklearn.cross_validation import train_test_split
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size = 0.2,
random_state = 1)
"""sc_y = StandardScaler()
y_train = sc_y.fit_transform(y_train)"""
#Feature Scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
使随机森林分类适合训练集
from sklearn.ensemble import RandomForestClassifier
classifier = RandomForestClassifier(n_estimators = 500,
criterion = 'entropy',
random_state = 0,
oob_score = True)
classifier.fit(X_train, y_train)
y_pred = classifier.predict(X_test)
制作混淆矩阵
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)
from sklearn.model_selection import cross_val_score
accuracies = cross_val_score(estimator= classifier,
X = X_train,
y = y_train,
cv=10)
accuracies_mean = accuracies.mean()
accuracies_std = accuracies.std()
print("Accuracy and STD of RF")
print(accuracies_mean)
print(accuracies_std)
Keras模型
model = Sequential()
model.add(Dense(12, input_dim=45, activation='relu'))
model.add(Dense(8, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
from keras import optimizers
numpy.random.seed(7)
import datetime, os
logdir = os.path.join("logs", datetime.datetime.now().strftime("%Y%m%d-%H%M%S"))
tensorboard_callback = tf.keras.callbacks.TensorBoard(logdir, histogram_freq=1)
sgd = optimizers.SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='mean_squared_error', optimizer=sgd, metrics=['accuracy'])
sgd = optimizers.SGD(lr=0.01, clipnorm=1.)
model.fit(X_train, y_train,
batch_size=50000,
epochs=10,
verbose=1,
validation_data=(X_test, y_test),
callbacks=None)
y_pred = model.predict(X_test)
score = model.evaluate(X_test, y_test, verbose=1)
建议我如何提高喀拉拉邦的准确性
答案 0 :(得分:0)
您处于分类(而不是回归)设置中,因此您应不将MSE用作Keras中的损失函数(用于回归问题);将模型编译更改为
model.compile(loss='binary_crossentropy', optimizer=sgd, metrics=['accuracy'])
有关更多详细信息,请参见What function defines accuracy in Keras when the loss is mean squared error (MSE)?,尽管设置为“反”(试图在回归问题中使用准确性)。
您的batch_size=50000
看上去非常高,但是如果您不遇到内存问题,则可以解决。