就像大标题所说的那样,我正在尝试使用GridSearchCV为“随机森林回归”找到最佳参数,并且正在使用mse测量结果。
Inputs_Treino = dataset.iloc[:253,1:4].values
Outputs_Treino = dataset.iloc[:253,-1].values
Inputs_Teste = dataset.iloc[254:,1:4].values
Outputs_Teste = dataset.iloc[254:,-1].values
estimator = RandomForestRegressor()
para_grids = {
"n_estimators" : [10,50,100],
"max_features" : ["auto", "log2", "sqrt"],
"bootstrap" : [True, False]
}
grid = GridSearchCV(estimator, para_grids, scoring = 'mean_squared_error')
grid.fit(Inputs_Treino, Outputs_Treino)
forest = grid.best_estimator_
reg_prediction=forest.predict(Inputs_Teste)
print (grid.best_score_, grid.best_params_)
mse = mean_absolute_error(Outputs_Teste, reg_prediction)
这是代码的要点(我所知道的没有什么太复杂的,只是从头开始)
当我打印grid.best_estimator_的结果时,我得到了
RandomForestRegressor(bootstrap=True, criterion='mse', max_depth=None,
max_features='auto', max_leaf_nodes=None,
min_impurity_split=1e-07, min_samples_leaf=1,
min_samples_split=2, min_weight_fraction_leaf=0.0,
n_estimators=50, n_jobs=1, oob_score=False, random_state=None,
verbose=0, warm_start=False)
问题是,如果我尝试使用这些参数创建一个回归器(根本不使用网格搜索),并且以相同的方式对其进行训练,那么我将在测试集上获得更大的MSE(5.483837301587303与43.801520165079467)
Inputs_Treino = dataset.iloc[:253,1:4].values
Outputs_Treino = dataset.iloc[:253,-1].values
Inputs_Teste = dataset.iloc[254:,1:4].values
Outputs_Teste = dataset.iloc[254:,-1].values
regressor = RandomForestRegressor(bootstrap=True, criterion='mse', max_depth=None,
max_features='auto', max_leaf_nodes=None,
min_impurity_split=1e-07, min_samples_leaf=1,
min_samples_split=2, min_weight_fraction_leaf=0.0,
n_estimators=50, n_jobs=1, oob_score=False, random_state=None,
verbose=0, warm_start=False)
regressor.fit(Inputs_Treino,Outputs_Treino)
#fazer as predictions
Teste_Prediction = regressor.predict(Inputs_Teste);
mse = mean_squared_error(Outputs_Teste, Teste_Prediction);
这与GridSearchCV执行的交叉验证有关吗?我在这里想念什么?