scikit-learn:compute&绘制递归KBEST特征(k ='all')性能

时间:2017-01-30 19:54:43

标签: python machine-learning scikit-learn feature-extraction bigdata

我的目标:

  • 使用SelectKBest (KBest)k="all",对排名功能进行排序(简单和完成)
  • 绘制已排序要素(pretty much as this example for recursive feature elimination, RFECV)的递归/渐进交叉验证效果,即1)计算最顶层要素的交叉验证效果,2然后计算最顶级+第二顶级,3)然后+第三,... n)的cv性能。所有特征组合。 (有点苦心经营)
  • 绘制如下图所示的结果(仅使用已排序的KBest-all功能代替RFECV)。 (容易)。

是的,我可以k - 遍历所有排名的功能,然后“转换”数据以仅允许k最佳功能,然后计算每个功能的交叉验证性能,最后得到所有分数和情节... - 我想避免这个代码。

我期待 标准答案 我猜这样的包装函数必须已存在于优秀的scikit-learn库中。

或许可以使用GridSearchCV吗?

enter image description here

1 个答案:

答案 0 :(得分:1)

我没有找到标准解决方案,所以这是我所做的伪代码:

(如果有兴趣,很乐意提供一个Jupyter工作示例)

def get_sorted_kbest_feature_keys(kbest_fitted_model):
    return [fkey for fkey, _ in sorted(enumerate(kbest_fitted_model.scores_), key=lambda tuple: tuple[1], reverse=True)]

def select_features_transformer_function(X, **kwargs):
    selected_feature_keys = kwargs["selected_feature_keys"]

    X_new = X[:, selected_feature_keys]
    # apply other transformers as desired

    return X_new

-

kbest = SelectKBest(scoring_func, k="all")  # scoring_func like "f1_macro"
kbest.fit(X, y)
selected_feature_keys = get_kbest_sorted_feature_keys(kbest)

scores = []

for num_seletected_kbest_features in range(1, num_features + 1):

    selected_feature_keys = sorted_kbest_feature_keys[:num_seletected_kbest_features]
    my_transformer = FunctionTransformer(select_features_transformer_function, accept_sparse=True, kw_args={"selected_feature_keys": selected_feature_keys})

    classifier = # example SVC
    estimator = make_pipeline(my_transformer, classifier)

    cv_scores = cross_val_score(estimator, X, y, scoring=scoring_name, verbose=True, n_jobs=-1)
    scores.append(cv_scores.mean())

# Then I can plot the scores as in:

### http://scikit-learn.org/stable/auto_examples/feature_selection/plot_rfe_with_cross_validation.html#sphx-glr-auto-examples-feature-selection-plot-rfe-with-cross-validation-py