我有一个包含要素联合对象(num_cat_union)和线性回归的管道。
当我将特征联合应用于我的数据,然后对线性回归进行网格搜索时,我得到的RMSE为32760
但是,当我使用SAME特征联合,线性回归和网格搜索运行管道时,我得到的RMSE为91490
这可能是怎么回事?为什么会有差异?
num_pipeline = make_pipeline(NumSelector(), NumImputer())
cat_pipeline = make_pipeline(CatSelector(), CatImputer(), OneHotEncoder(handle_unknown='ignore', sparse=False))
num_cat_union = make_union(num_pipeline, cat_pipeline)
full_pipe = make_pipeline(num_cat_union, LinearRegression())
def model_metrics(model, params, scoring, X, y):
grid = GridSearchCV(model, params, scoring=scoring, error_score=0)
grid.fit(X, y)
print('X:', X.shape)
print('Best RMSE: ', np.sqrt(-grid.best_score_))
print('Best parameters: ', grid.best_params_)
print('Mean fit time: ', round(np.sqrt(grid.cv_results_['mean_fit_time'].mean()), 3))
print('Mean scoring time: ', round(grid.cv_results_['mean_score_time'].mean(), 3))
#The following gives ~90000 RMSE
params = {'featureunion__pipeline-1__numimputer__strategy': ['mean', 'median']}
model_metrics(full_pipe, params, 'neg_mean_squared_error', housing, y)
#The following gives ~30000 RMSE (P.S. the default strategy for imputing is mean already)
X = num_cat_union.fit_transform(housing)
lin_params = {}
model_metrics(LinearRegression(), lin_params, 'neg_mean_squared_error', X, y)
#Shouldn't they give the same results??