使用scikit-learn对所有功能进行排序

时间:2019-03-11 08:50:22

标签: scikit-learn feature-selection

我正在尝试使用scikit-learn f_regressionSelectKBest对所有功能进行排序。如果已排序特征k的数量小于特征n的总数,则该方法效果很好。但是,如果我设置了k = n,那么SelectKBest的输出将与原始要素数组的顺序相同。如何根据功能的重要性对所有功能进行排序?

代码如下:

from sklearn.feature_selection import SelectKBest, f_regression

n = len(training_features.columns)

selector = SelectKBest(f_regression, k = n)
selector.fit(training_features.values, training_targets.values[:, 0])

k_best_features = list(training_features.columns[selector.get_support(indices = True)])

2 个答案:

答案 0 :(得分:0)

我认为可以使用

根据f_regression给出的分数对特征进行排序
pd.DataFrame(dict(feature_names= training_features.columns , scores = selector.scores_))\
    .sort_values('scores',ascending = False)

答案 1 :(得分:0)

我最终使用了此解决方案:

import numpy as np
from sklearn.feature_selection import f_regression

k = 10    # number of best features to obtain

scores, _ = f_regression(training_features.values, training_targets.values[:, 0])
indices = np.argsort(scores)[::-1]
k_best_features = list(training_features.columns.values[indices[0:k]])