将GridSearchCV和具有相同参数的Random Forest Regressor使用会产生不同的结果

时间:2018-10-15 22:28:22

标签: machine-learning scikit-learn random-forest grid-search

就像大标题所说的那样,我正在尝试使用GridSearchCV为“随机森林回归”找到最佳参数,并且正在使用mse测量结果。

Inputs_Treino = dataset.iloc[:253,1:4].values
Outputs_Treino = dataset.iloc[:253,-1].values
Inputs_Teste = dataset.iloc[254:,1:4].values
Outputs_Teste = dataset.iloc[254:,-1].values

estimator = RandomForestRegressor()
para_grids = {
            "n_estimators" : [10,50,100],
            "max_features" : ["auto", "log2", "sqrt"],
            "bootstrap"    : [True, False]
        }


grid = GridSearchCV(estimator, para_grids, scoring = 'mean_squared_error')
grid.fit(Inputs_Treino, Outputs_Treino)
forest = grid.best_estimator_

reg_prediction=forest.predict(Inputs_Teste)

print (grid.best_score_, grid.best_params_)

mse = mean_absolute_error(Outputs_Teste, reg_prediction)

这是代码的要点(我所知道的没有什么太复杂的,只是从头开始)

当我打印grid.best_estimator_的结果时,我得到了

RandomForestRegressor(bootstrap=True, criterion='mse', max_depth=None,
           max_features='auto', max_leaf_nodes=None,
           min_impurity_split=1e-07, min_samples_leaf=1,
           min_samples_split=2, min_weight_fraction_leaf=0.0,
           n_estimators=50, n_jobs=1, oob_score=False, random_state=None,
           verbose=0, warm_start=False)

问题是,如果我尝试使用这些参数创建一个回归器(根本不使用网格搜索),并且以相同的方式对其进行训练,那么我将在测试集上获得更大的MSE(5.483837301587303与43.801520165079467)

Inputs_Treino = dataset.iloc[:253,1:4].values
Outputs_Treino = dataset.iloc[:253,-1].values
Inputs_Teste = dataset.iloc[254:,1:4].values
Outputs_Teste = dataset.iloc[254:,-1].values

regressor = RandomForestRegressor(bootstrap=True, criterion='mse', max_depth=None,
           max_features='auto', max_leaf_nodes=None,
           min_impurity_split=1e-07, min_samples_leaf=1,
           min_samples_split=2, min_weight_fraction_leaf=0.0,
           n_estimators=50, n_jobs=1, oob_score=False, random_state=None,
           verbose=0, warm_start=False)

regressor.fit(Inputs_Treino,Outputs_Treino)

#fazer as predictions
Teste_Prediction = regressor.predict(Inputs_Teste);

mse = mean_squared_error(Outputs_Teste, Teste_Prediction);

这与GridSearchCV执行的交叉验证有关吗?我在这里想念什么?

0 个答案:

没有答案