为什么我从同一操作中得到不同的分数?

时间:2018-12-17 05:19:19

标签: machine-learning scikit-learn linear-regression pipeline grid-search

我有一个包含要素联合对象(num_cat_union)和线性回归的管道。

当我将特征联合应用于我的数据,然后对线性回归进行网格搜索时,我得到的RMSE为32760

但是,当我使用SAME特征联合,线性回归和网格搜索运行管道时,我得到的RMSE为91490

这可能是怎么回事?为什么会有差异?

num_pipeline = make_pipeline(NumSelector(), NumImputer())

cat_pipeline = make_pipeline(CatSelector(), CatImputer(), OneHotEncoder(handle_unknown='ignore', sparse=False))

num_cat_union = make_union(num_pipeline, cat_pipeline)

full_pipe = make_pipeline(num_cat_union, LinearRegression())

def model_metrics(model, params, scoring, X, y):
    grid = GridSearchCV(model, params, scoring=scoring, error_score=0)
    grid.fit(X, y)
    print('X:', X.shape)
    print('Best RMSE: ', np.sqrt(-grid.best_score_))
    print('Best parameters: ', grid.best_params_)
    print('Mean fit time: ', round(np.sqrt(grid.cv_results_['mean_fit_time'].mean()), 3))
    print('Mean scoring time: ', round(grid.cv_results_['mean_score_time'].mean(), 3))


#The following gives ~90000 RMSE

params = {'featureunion__pipeline-1__numimputer__strategy': ['mean', 'median']}

model_metrics(full_pipe, params, 'neg_mean_squared_error', housing, y)

#The following gives ~30000 RMSE (P.S. the default strategy for imputing is mean already)

X = num_cat_union.fit_transform(housing)
lin_params = {}

model_metrics(LinearRegression(), lin_params, 'neg_mean_squared_error', X, y)

#Shouldn't they give the same results??

0 个答案:

没有答案