这里有个有趣的问题-我有GridSearchCV
个结果,从grid_search_cv.results_
属性中挑选樱桃后,结果如下:
Input: pd.DataFrame(grid_clf_rf.cv_results_).iloc[4966]['params']
Output: {'rf__max_depth': 40, 'rf__max_features': 2, 'rf__n_estimators': 310}
现在,据我所知,Imbalanced Learn包的Pipeline对象是SciKit-Learn的Pipeline的包装,它应在**fit_params
方法中接受.fit()
参数,如下所示:
clf = BalancedRandomForestClassifier(random_state = random_state,
n_jobs = n_jobs)
pipeline = Pipeline([('nt', nt), ('rf', clf)])
pipeline.fit(X_train, y_train, **pd.DataFrame(grid_clf_rf.cv_results_).iloc[4966]['params'])
但是,当我执行上面的表达式时,我得到以下结果:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-64-a26424dc8038> in <module>
4 pipeline = Pipeline([('nt', nt), ('rf', clf)])
5
----> 6 pipeline.fit(X_train, y_train, **pd.DataFrame(grid_clf_rf.cv_results_).iloc[4966]['params'])
7
8 print_scores(pipeline, X_train, y_train, X_test, y_test)
/opt/conda/lib/python3.7/site-packages/imblearn/pipeline.py in fit(self, X, y, **fit_params)
237 Xt, yt, fit_params = self._fit(X, y, **fit_params)
238 if self._final_estimator is not None:
--> 239 self._final_estimator.fit(Xt, yt, **fit_params)
240 return self
241
TypeError: fit() got an unexpected keyword argument 'max_features'
有什么主意我在做错什么吗?
答案 0 :(得分:1)
为什么要向包含.fit()
方法的模型构建参数的数据帧中输入数据,它只需要X和y两个参数即可。您需要将模型的参数传递给BalancedRandomForestClassifier
构造函数。由于您的参数名称与BalancedRandomForestClassifier所用的参数名称不匹配,因此您需要像这样手动输入
clf = BalancedRandomForestClassifier(max_depth = 40, max_features = 2, n_estimators = 310, random_state = random_state, n_jobs = n_jobs)
希望这会有所帮助!
答案 1 :(得分:1)
让我们假设您想到了一组如下所示的超参数
hyper_params= {'rf__max_depth': 40, 'rf__max_features': 2, 'rf__n_estimators': 310}
如@ Parthasarathy Subburaj所述,这些不是fit_params
。我们可以使用.set_params()
选项为管道内的分类器设置这些参数
from imblearn.ensemble import BalancedRandomForestClassifier
from sklearn.datasets import make_classification
from imblearn.pipeline import Pipeline
X, y = make_classification(n_samples=1000, n_classes=3,
n_informative=4, weights=[0.2, 0.3, 0.5],
random_state=0)
clf = BalancedRandomForestClassifier(random_state=0)
pipeline = Pipeline([ ('rf', clf)])
hyper_params= {'rf__max_depth': 40, 'rf__max_features': 2, 'rf__n_estimators': 310}
pipeline.set_params(**hyper_params)
pipeline.fit(X,y)
#
Pipeline(memory=None,
steps=[('rf',
BalancedRandomForestClassifier(bootstrap=True,
class_weight=None,
criterion='gini', max_depth=40,
max_features=2,
max_leaf_nodes=None,
min_impurity_decrease=0.0,
min_samples_leaf=2,
min_samples_split=2,
min_weight_fraction_leaf=0.0,
n_estimators=310, n_jobs=1,
oob_score=False, random_state=0,
replacement=False,
sampling_strategy='auto',
verbose=0, warm_start=False))],
verbose=False)