Question

我使用Scikit学习selectKbest来选择最好的功能，其中900个来自大约500个。如下所示，其中d是所有要素的数据帧。

from sklearn.feature_selection import SelectKBest, chi2, f_classif
X_new = SelectKBest(chi2, k=491).fit_transform(d, label_vs)

当我现在打印X_new时，它仅提供数字，但我需要所选功能的名称才能稍后使用它们。

我尝试了诸如X_new.dtype.names之类的事情，但没有得到任何回报，并且尝试将X_new转换为数据帧，但是我得到的唯一列名是

1, 2, 3, 4...

有没有办法知道所选功能的名称是什么？

Answer 1

以下是使用get_support()的方法：

chY = SelectKBest(chi2, k=491)
X_new = chY.fit_transform(d, label_vs)
column_names = [column[0]  for column in zip(d.columns,chY.get_support()) if column[1]]

从@AI_Learning的答案中，您可以通过以下方式获取列名：

column_names = d.columns[chY.get_support()]

Answer 2

您可以使用feature_selection的.get_support()参数从初始数据框中获取特征名称。

feature_selector = SelectKBest(chi2, k=491)
d.columns[feature_selector.get_support()]

工作示例：

from sklearn.datasets import load_digits
import pandas as pd
from sklearn.feature_selection import SelectKBest, chi2
X, y = load_digits(return_X_y=True)
df = pd.DataFrame(X, columns= ['feaure %s'%i for i in range(X.shape[1])])

feature_selector = SelectKBest(chi2, k=20)

X_new = feature_selector.fit_transform(df, y)
X_new.shape

df.columns[feature_selector.get_support()]

输出：

Index（['feaure 5'，'feaure 6'，'feaure 13'，'feaure 19'，'feaure 20'， 'feaure 21'，'feaure 26'，'feaure 28'，'feaure 30'，'feaure 33'， 'feaure 34'，'feaure 41'，'feaure 42'，'feaure 43'，'feaure 44'， 'feaure 46'，'feaure 54'，'feaure 58'，'feaure 61'，'feaure 62']， dtype ='object'）

从selectKbest

2 个答案: