sklearn-为什么gridsearch cv参数的得分比默认值差?

时间:2019-08-14 14:00:55

标签: python scikit-learn gridsearchcv

我使用gridsearchcv对训练数据集上KNearestNeighbors的参数进行了超调,但令人惊讶的是,它返回的结果比测试集上的默认参数差。为什么会这样?任何对适当使用的gridsearchcv的见解都将受到赞赏,我需要对几种算法进行比较,以将默认结果与超调结果进行比较。

gridsearchcv代码:

    # Parameters we want to try
    param_grid = {'n_neighbors': [1, 2, 3, 5, 7],
                  'weights': ['uniform', 'distance'],
                  'algorithm': ['auto', 'ball_tree', 'kd_tree', 'brute'],
                  'leaf_size': [20, 30, 40],
                  'p': [1, 2, 3],
                  'metric': ['minkowski', 'chebyshev', 'manhattan', 'euclidean']} 
    # Define the grid search we want to run. Run it with four cpus in parallel.
    gs_cv = GridSearchCV(KNeighborsClassifier(), param_grid, n_jobs=4)

    # Run the grid search (should only be on training data!)
    gs_cv.fit(train_X, train_y)

    # Print the best parameters
    print(gs_cv.best_params_)

    #{'algorithm': 'auto', 'leaf_size': 20, 'metric': 'minkowski', 'n_neighbors': 7, 'p': 1, 'weights': 'uniform'}

具有以下参数的结果:

    knn = KNeighborsClassifier(n_neighbors=7,
                               weights='uniform',
                               algorithm='auto',
                               leaf_size=20,
                               p=1,
                               metric='minkowski')

    knn.fit(train_X, train_y)


    print("="*30)

    print('****Results****')
    train_predictions = knn.predict(test_X)
    acc = accuracy_score(test_y, train_predictions)
    print("Accuracy: {:.2%}".format(acc))

    train_predictions = knn.predict_proba(test_X)
    ll = log_loss(test_y, train_predictions, labels=np.unique(train_y))
    print("Log Loss: {:.4}".format(ll))

    log_entry = pd.DataFrame([[name, acc*100, ll]], columns=log_cols)
    log = log.append(log_entry)
    ==============================
    ****Results****
    Accuracy: 87.50%
    Log Loss: 0.3354

使用默认的KNN参数:

    knn = KNeighborsClassifier()

    knn.fit(train_X, train_y)


    print("="*30)

    print('****Results****')
    train_predictions = knn.predict(test_X)
    acc = accuracy_score(test_y, train_predictions)
    print("Accuracy: {:.2%}".format(acc))

    train_predictions = knn.predict_proba(test_X)
    ll = log_loss(test_y, train_predictions, labels=np.unique(train_y))
    print("Log Loss: {:.4}".format(ll))

    log_entry = pd.DataFrame([[name, acc*100, ll]], columns=log_cols)
    log = log.append(log_entry)
    ==============================
    ****Results****
    Accuracy: 91.67%
    Log Loss: 0.2398

0 个答案:

没有答案