我想对150个特征有230个样本的30组对象进行二进制分类。我发现很难实施,特别是在进行特征选择时,很难完成嵌套嵌套的参数集的交叉验证,并使用SVM和随机森林的两个分类器报告准确性,并查看选择了哪些特征。
这是我的新手,我确定以下代码不正确:
from sklearn.model_selection import LeaveOneGroupOut
from sklearn.feature_selection import RFE
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import cross_val_score
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier
X= the data (230 samples * 150 features)
y= [1,0,1,0,0,0,1,1,1..]
groups = [1,2...30]
param_grid = [{'estimator__C': [0.01, 0.1, 1.0, 10.0]}]
inner_cross_validation = LeaveOneGroupOut().split(X, y, groups)
outer_cross_validation = LeaveOneGroupOut().split(X, y, groups)
estimator = SVC(kernel="linear")
selector = RFE(estimator, step=1)
grid_search = GridSearchCV(selector, param_grid, cv=inner_cross_validation)
grid_search.fit(X, y)
scores = cross_val_score(grid_search, X, y,cv=outer_cross_validation)
我不知道在上面设置“随机森林分类器”的位置,因为我想比较SVM和随机森林之间的准确性。
非常感谢您阅读,并希望有人能帮助我。
最诚挚的问候
答案 0 :(得分:0)
您应该以与调用SVM相同的方式来调用树
#your libraries
from sklearn.tree import DecisionTreeClassifier
#....
estimator = SVC(kernel="linear")
estimator2 = DecisionTreeClassifier( ...parameters here...)
selector = RFE(estimator, step=1)
selector2 = RFE(estimator2, step=1)
grid_search = GridSearchCV(selector, param_grid, cv=inner_cross_validation)
grid_search = GridSearchCV(selector2, ..greed for the tree here.., cv=inner_cross_validation)
请注意,此过程将导致两套不同的选定功能:一个用于SVM,一个用于决策树