Question

我使用f_classif来确定我的功能得分：

def select_feature_anova(x,y,data):

    anova = feature_selection.f_classif(x, y)

    threshold = 10

    # How to build x_new?

将x转换为x_new以使其仅包括分数高于阈值的特征的最简单方法是什么？另外，我想排除得分为Nan的功能。

Answer 1

基于documentation，我们可以基于F分数过滤功能。

尝试一下！

from sklearn.feature_selection import f_classif
import numpy as np
np.seterr(divide='ignore', invalid='ignore')

def select_feature_anova(X, y, threshold=10):

    F,_ = feature_selection.f_classif(X, y)

    X_new = X[:,F>threshold]

    return X_new

sklearn使用f_classif并选择分数高于阈值的特征

1 个答案: