喀拉拉邦的准确性没有提高

时间:2019-09-04 09:29:43

标签: python machine-learning keras random-forest

我正在比较Keras和Random Forest。我跟随研究论文,它给keras模型提供了比随机森林模型更高的准确性,但是当我实现它时却没有给我。 RF *的精度和标准差* 0.997 0.0006      Keras的精度0.0079

#Importing dataset 
    dataset = pd.read_csv('KDD_Dataset.csv')
    X = dataset.iloc[:, :-1].values
    y = dataset.iloc[:, 41:42].values

    from sklearn.preprocessing import LabelEncoder
    labelencoder_X = LabelEncoder()
    X[:,0] = labelencoder_X.fit_transform(X[:,0])
    X[:,1] = labelencoder_X.fit_transform(X[:,1])
    X[:,2] = labelencoder_X.fit_transform(X[:,2])
    #
    from sklearn.preprocessing import OneHotEncoder
    onehotencoder_0 = OneHotEncoder(categorical_features=[0])
    onehotencoder_1 = OneHotEncoder(categorical_features=[1])
    onehotencoder_2 = OneHotEncoder(categorical_features=[2])
    X = onehotencoder_0.fit_transform(X).toarray()
    X = onehotencoder_1.fit_transform(X).toarray()
    X = onehotencoder_2.fit_transform(X).toarray()
     Encoding categorical data y
    from sklearn.preprocessing import LabelEncoder
    labelencoder_y = LabelEncoder()
    y = labelencoder_y.fit_transform(y)
    max(y)

将数据集分为训练集和测试集

        #from sklearn.cross_validation import train_test_split
          from sklearn.model_selection import train_test_split
    X_train, X_test, y_train, y_test = train_test_split(X, y, 
                                                        test_size = 0.2, 
                                                        random_state = 1)
    """sc_y = StandardScaler()
    y_train = sc_y.fit_transform(y_train)"""

#Feature Scaling
       from sklearn.preprocessing import StandardScaler
       sc = StandardScaler()
       X_train = sc.fit_transform(X_train)
       X_test = sc.transform(X_test)

使随机森林分类适合训练集

        from sklearn.ensemble import RandomForestClassifier
    classifier = RandomForestClassifier(n_estimators = 500, 
                                        criterion = 'entropy', 
                                        random_state = 0,
                                        oob_score = True)
    classifier.fit(X_train, y_train)

    y_pred = classifier.predict(X_test)

制作混淆矩阵

      from sklearn.metrics import confusion_matrix
    cm = confusion_matrix(y_test, y_pred)


    from sklearn.model_selection import cross_val_score
    accuracies = cross_val_score(estimator= classifier, 
                                 X = X_train,
                                 y = y_train,
                                 cv=10)

    accuracies_mean = accuracies.mean()
    accuracies_std = accuracies.std()

    print("Accuracy and STD of RF")
    print(accuracies_mean)
    print(accuracies_std)

Keras模型

    model = Sequential()
    model.add(Dense(12, input_dim=45, activation='relu'))
    model.add(Dense(8, activation='relu'))
    model.add(Dense(1, activation='sigmoid'))


    from keras import optimizers

    numpy.random.seed(7)
    import datetime, os
    logdir = os.path.join("logs", datetime.datetime.now().strftime("%Y%m%d-%H%M%S"))
    tensorboard_callback = tf.keras.callbacks.TensorBoard(logdir, histogram_freq=1)
    sgd = optimizers.SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)

    model.compile(loss='mean_squared_error', optimizer=sgd, metrics=['accuracy'])

    sgd = optimizers.SGD(lr=0.01, clipnorm=1.)


              model.fit(X_train, y_train,
              batch_size=50000,
              epochs=10,
              verbose=1,
              validation_data=(X_test, y_test),
              callbacks=None)

    y_pred = model.predict(X_test)

    score = model.evaluate(X_test, y_test, verbose=1)

建议我如何提高喀拉拉邦的准确性

1 个答案:

答案 0 :(得分:0)

您处于分类(而不是回归)设置中,因此您应将MSE用作Keras中的损失函数(用于回归问题);将模型编译更改为

model.compile(loss='binary_crossentropy', optimizer=sgd, metrics=['accuracy'])

有关更多详细信息,请参见What function defines accuracy in Keras when the loss is mean squared error (MSE)?,尽管设置为“反”(试图在回归问题中使用准确性)。

您的batch_size=50000看上去非常高,但是如果您不遇到内存问题,则可以解决。