如何在python中使用kruskal wallis测试进行特征选择?

时间:2019-06-20 13:50:04

标签: python-3.x scikit-learn

我正在使用流水线执行机器学习分类任务,以便在每次折叠中都进行特征选择和规范化。我想尝试使用kruskal walis测试进行特征选择,因为我的某些特征是非高斯的。但是因为我想在折叠内进行选择,所以我不确定如何编写代码。有人这样做吗?我曾考虑过使用使用方差分析的f_classif源代码(https://searchcode.com/codesearch/view/2154679/),但不幸的是,我的编码技能还不够好。有人可以帮忙吗?这是我当前的代码:

from sklearn.preprocessing import StandardScaler
from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import mutual_info_classif
from sklearn.model_selection import cross_validate
from sklearn.model_selection import KFold
from sklearn.pipeline import Pipeline
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
import pandas as pd

df = pd.DataFrame({'length': [5, 8, 0.2, 10, 25, 3.2], 
               'width': [60, 102, 80.5, 30, 52, 81],
               'group': [1, 0, 0, 0, 1, 1]})

array = df.values
y = array[:,2]
X = array[:,0:2]

select = SelectKBest(f_classif, k=2)
scl = StandardScaler()
svm = SVC(kernel='linear', probability=True, random_state=42)
logr = LogisticRegression(random_state=42)

pipeline = Pipeline([('select', select), ('scale', scl), ('svm', svm)])

split = KFold(n_splits=2, shuffle=True, random_state=42)

output = cross_validate(pipeline, X, y, cv=split, 
            scoring = ('accuracy', 'f1', 'roc_auc'),
            return_estimator = True,
            return_train_score= True)

0 个答案:

没有答案