递归特征消除与scikit中的嵌套(保留一组)交叉验证相结合

时间:2018-12-16 22:54:44

标签: python machine-learning scikit-learn

我想对150个特征有230个样本的30组对象进行二进制分类。我发现很难实施,特别是在进行特征选择时,很难完成嵌套嵌套的参数集的交叉验证,并使用SVM和随机森林的两个分类器报告准确性,并查看选择了哪些特征。

这是我的新手,我确定以下代码不正确:

from sklearn.model_selection import LeaveOneGroupOut        
from sklearn.feature_selection import RFE    
from sklearn.model_selection import GridSearchCV    
from sklearn.model_selection import cross_val_score   
from sklearn.svm import SVC    
from sklearn.ensemble import RandomForestClassifier      


X= the data (230 samples * 150 features)      
y= [1,0,1,0,0,0,1,1,1..]   
groups = [1,2...30] 


param_grid = [{'estimator__C': [0.01, 0.1, 1.0, 10.0]}]   
inner_cross_validation = LeaveOneGroupOut().split(X, y, groups)   
outer_cross_validation = LeaveOneGroupOut().split(X, y, groups)    
estimator = SVC(kernel="linear")   
selector = RFE(estimator, step=1)    
grid_search = GridSearchCV(selector, param_grid, cv=inner_cross_validation)   
grid_search.fit(X, y)   
scores = cross_val_score(grid_search, X, y,cv=outer_cross_validation)

我不知道在上面设置“随机森林分类器”的位置,因为我想比较SVM和随机森林之间的准确性。

非常感谢您阅读,并希望有人能帮助我。

最诚挚的问候

1 个答案:

答案 0 :(得分:0)

您应该以与调用SVM相同的方式来调用树

#your libraries
from sklearn.tree import DecisionTreeClassifier

#....
estimator = SVC(kernel="linear") 
estimator2 = DecisionTreeClassifier( ...parameters here...)


selector = RFE(estimator, step=1)
selector2 = RFE(estimator2, step=1)


grid_search = GridSearchCV(selector, param_grid, cv=inner_cross_validation)
grid_search = GridSearchCV(selector2, ..greed for the tree here.., cv=inner_cross_validation)

请注意,此过程将导致两套不同的选定功能:一个用于SVM,一个用于决策树