我正在尝试使用GridSearchCV
作为分类器,以SVC
进行交叉验证的递归特征消除(RFECV),如下所示。
我的代码如下。
X = df[my_features]
y = df['gold_standard']
x_train, x_test, y_train, y_test = train_test_split(X, y, random_state=0)
k_fold = StratifiedKFold(n_splits=10, shuffle=True, random_state=0)
clf = SVC(class_weight="balanced")
rfecv = RFECV(estimator=clf, step=1, cv=k_fold, scoring='roc_auc')
param_grid = {'estimator__C': [0.001, 0.01, 0.1, 0.25, 0.5, 0.75, 1.0, 10.0, 100.0, 1000.0],
'estimator__gamma': [0.001, 0.01, 0.1, 1.0, 2.0, 3.0, 10.0, 100.0, 1000.0],
'estimator__kernel':('rbf', 'sigmoid', 'poly')
}
CV_rfc = GridSearchCV(estimator=rfecv, param_grid=param_grid, cv= k_fold, scoring = 'roc_auc', verbose=10)
CV_rfc.fit(x_train, y_train)
但是,我收到一条错误消息:RuntimeError: The classifier does not expose "coef_" or "feature_importances_" attributes
有没有办法解决此错误?如果不是,我可以与feature selection
一起使用的其他SVC
技术是什么?
很高兴在需要时提供更多详细信息。
答案 0 :(得分:2)
要查看更多功能选择实现,请查看:
https://scikit-learn.org/stable/modules/classes.html#module-sklearn.feature_selection
作为示例,在下一个链接中,他们将PCA与k最佳功能选择和svc一起使用。
使用的一个示例是,为了更简单起见,可以修改前一个链接:
iris = load_iris()
X, y = iris.data, iris.target
# Maybe some original features where good, too?
selection = SelectKBest()
# Build SVC
svm = SVC(kernel="linear")
# Do grid search over k, n_components and C:
pipeline = Pipeline([("features", selection), ("svm", svm)])
param_grid = dict(features__k=[1, 2],
svm__C=[0.1, 1, 10])
grid_search = GridSearchCV(pipeline, param_grid=param_grid, cv=5, verbose=10)
grid_search.fit(X, y)
print(grid_search.best_estimator_)
答案 1 :(得分:0)
emmm ...在sklearn 0.19.2中,该问题似乎已解决。我的代码与您的代码相似,但有效:
svc = SVC(
kernel = 'linear',
probability = True,
random_state = 1 )
rfecv = RFECV(
estimator = svc,
scoring = 'roc_auc'
)
rfecv.fit(train_values,train_Labels)
selecInfo = rfecv.support_
selecIndex = np.where(selecInfo==1)