Question

我目前正致力于特征选择过程，作为其中的一部分，我需要对熊猫数据框中存在的可用特征列表进行卡方检验，并确定哪些是熊猫的最佳'n'最佳特征数据帧。

从互联网上的文章我可以理解，'n'的值由我们分配给SelectKBest函数的'k'参数的值决定，该参数可以从sklearn.feature_selection导入。

但是，如何了解卡方检验选择的顶级“n”特征的特征/列名称或数字。

为了更好地理解下面的内容，我从这个链接中提到了一个例子（感谢chris albon在他的网站上提供了一个简单的例子）：https://chrisalbon.com/machine-learning/chi-squared_for_feature_selection.html

# Load libraries
from sklearn.datasets import load_iris
from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import chi2

# Load iris data
iris = load_iris()

# Create features and target
X = iris.data
y = iris.target

# Convert to categorical data by converting data to integers
X = X.astype(int)

# Select two features with highest chi-squared statistics
chi2_selector = SelectKBest(chi2, k=2)
X_kbest = chi2_selector.fit_transform(X, y)
type(X_kbest)

# Show results
print('Original number of features:', X.shape[1])
print('Reduced number of features:', X_kbest.shape[1])

从代码中可以看出，输入数据作为numpy数组传递。假设四列的名称为Col_A，Col_B，Col_C，Col_D。 测试选择了第3和第4列作为两个最佳功能。这可以通过打印“X_kbest”

的值来看出

print(X_kbest)

[[1 0]
 [1 0]
 [1 0]
 ..., 
 [5 2]
 [5 2]
 [5 1]]

但是我需要输出作为一个列表，其中只包含所选的功能名称（在这种情况下，它是Col_C和Col_D）或功能名称以及数据

Python - 如何确定Chi Squared测试

0 个答案: