我正在将 XGBRegressor 与管道一起使用。管道包含预处理步骤和模型( XGBRegressor )。
下面是完整的预处理步骤。 (我已经定义了 numeric_cols 和 cat_cols )
numerical_transfer = SimpleImputer()
cat_transfer = Pipeline(steps = [
('imputer', SimpleImputer(strategy = 'most_frequent')),
('onehot', OneHotEncoder(handle_unknown = 'ignore'))
])
preprocessor = ColumnTransformer(
transformers = [
('num', numerical_transfer, numeric_cols),
('cat', cat_transfer, cat_cols)
])
最后一条管道是
my_model = Pipeline(steps = [('preprocessor', preprocessor), ('model', model)])
当我尝试不使用 early_stopping_rounds 进行调整时,代码工作正常。
(my_model.fit(X_train, y_train))
但是当我使用 early_stopping_rounds 时,如下所示,我遇到了错误。
my_model.fit(X_train, y_train, model__early_stopping_rounds=5, model__eval_metric = "mae", model__eval_set=[(X_valid, y_valid)])
我在以下地方出错:
model__eval_set=[(X_valid, y_valid)]) and the error is
ValueError: DataFrame.dtypes for data must be int, float or bool.
Did not expect the data types in fields MSZoning, Street, Alley, LotShape, LandContour, Utilities, LotConfig, LandSlope, Condition1, Condition2, BldgType, HouseStyle, RoofStyle, RoofMatl, MasVnrType, ExterQual, ExterCond, Foundation, BsmtQual, BsmtCond, BsmtExposure, BsmtFinType1, BsmtFinType2, Heating, HeatingQC, CentralAir, Electrical, KitchenQual, Functional, FireplaceQu, GarageType, GarageFinish, GarageQual, GarageCond, PavedDrive, PoolQC, Fence, MiscFeature, SaleType, SaleCondition
这是否意味着我应该在申请my_model.fit()之前对X_valid进行预处理,否则我做错了什么?
如果问题是我们需要在应用fit()之前对X_valid进行预处理,该如何使用我上面定义的预处理器来做到这一点?
编辑:我试图在不使用流水线的情况下预处理X_valid,但是出现错误,提示功能不匹配。
答案 0 :(得分:1)
问题在于管道不适合eval_set。因此,正如您所说,您需要预处理X_valid。为此,最简单的方法是使用没有“模型”步骤的管道。在安装管道之前,请使用以下代码:
\begin{table}[hbt!]
\caption{Descriptive Statistics of Mutual Fund Survival Times}
\label{tab:my-table}
\resizebox{\textwidth}{!}{%
\begin{tabular}{|l|l|l|l|l|l|l|l|}
\hline
& Total Months & Min & Max & Median & Mean & Mean S.E & Std. D \\ \hline
Number of Months & 330.00 & 36.00 & 329.00 & 111.00 & 133.34 & 2.92 & 79.06 \\ \hline
\end{tabular}%
}
\end{table}
然后按如下所示更改模型__eval_set后适合您的管道:
# Make a copy to avoid changing original data
X_valid_eval=X_valid.copy()
# Remove the model from pipeline
eval_set_pipe = Pipeline(steps = [('preprocessor', preprocessor)])
# fit transform X_valid.copy()
X_valid_eval = eval_set_pipe.fit(X_train, y_train).transform (X_valid_eval)