我的目标:
SelectKBest
(KBest
)和k="all"
,对排名功能进行排序(简单和完成)KBest-all
功能代替RFECV)。 (容易)。是的,我可以k
- 遍历所有排名的功能,然后“转换”数据以仅允许k
最佳功能,然后计算每个功能的交叉验证性能,最后得到所有分数和情节... - 我想避免这个代码。
我期待 标准答案 我猜这样的包装函数必须已存在于优秀的scikit-learn
库中。
或许可以使用GridSearchCV
吗?
答案 0 :(得分:1)
我没有找到标准解决方案,所以这是我所做的伪代码:
(如果有兴趣,很乐意提供一个Jupyter工作示例)
def get_sorted_kbest_feature_keys(kbest_fitted_model):
return [fkey for fkey, _ in sorted(enumerate(kbest_fitted_model.scores_), key=lambda tuple: tuple[1], reverse=True)]
def select_features_transformer_function(X, **kwargs):
selected_feature_keys = kwargs["selected_feature_keys"]
X_new = X[:, selected_feature_keys]
# apply other transformers as desired
return X_new
-
kbest = SelectKBest(scoring_func, k="all") # scoring_func like "f1_macro"
kbest.fit(X, y)
selected_feature_keys = get_kbest_sorted_feature_keys(kbest)
scores = []
for num_seletected_kbest_features in range(1, num_features + 1):
selected_feature_keys = sorted_kbest_feature_keys[:num_seletected_kbest_features]
my_transformer = FunctionTransformer(select_features_transformer_function, accept_sparse=True, kw_args={"selected_feature_keys": selected_feature_keys})
classifier = # example SVC
estimator = make_pipeline(my_transformer, classifier)
cv_scores = cross_val_score(estimator, X, y, scoring=scoring_name, verbose=True, n_jobs=-1)
scores.append(cv_scores.mean())
# Then I can plot the scores as in:
### http://scikit-learn.org/stable/auto_examples/feature_selection/plot_rfe_with_cross_validation.html#sphx-glr-auto-examples-feature-selection-plot-rfe-with-cross-validation-py