Question

灵感来自：http://scikit-learn.org/stable/auto_examples/feature_selection/plot_rfe_with_cross_validation.html#sphx-glr-auto-examples-feature-selection-plot-rfe-with-cross-validation-py

我想知道是否有获得特定分数的功能：

在这种情况下，我想知道，当#Features = 10时，所选的10个特征会给出该峰值。

有什么想法吗？

编辑：

这是用于获取该情节的代码：

from sklearn.feature_selection import RFECV
from sklearn.model_selection import KFold,StratifiedKFold #for K-fold cross validation
from sklearn.ensemble import RandomForestClassifier #Random Forest

# The "accuracy" scoring is proportional to the number of correct classifications
#kfold = StratifiedKFold(n_splits=10, random_state=1) # k=10, split the data into 10 equal parts
model_Linear_SVM=svm.SVC(kernel='linear', probability=True)
rfecv = RFECV(estimator=model_Linear_SVM, step=1, cv=kfold,scoring='accuracy')   #5-fold cross-validation
rfecv = rfecv.fit(X, y)

print('Optimal number of features :', rfecv.n_features_)
print('Best features :', X.columns[rfecv.support_])
print('Original features :', X.columns)
plt.figure()
plt.xlabel("Number of features selected")
plt.ylabel("Cross validation score \n of number of selected features")
plt.plot(range(1, len(rfecv.grid_scores_) + 1), rfecv.grid_scores_)
plt.show()

Answer 1

首先，你可以看到它选择了哪些特征，交叉验证分数最大（在你的情况下，这对应于特征17或21的数量，我不能从图中确定）

rfecv.support_

或

rfecv.ranking_

然后您可以通过

计算所选要素的重要性（针对cv得分曲线的峰值）

np.absolute(rfecv.estimator_.coef_)

用于简单估算器或

rfecv.estimator_.feature_importances_

如果你的估算器是一些整体，就像随机森林一样。

然后，您可以在循环中逐个删除最不重要的功能，并重新计算其余功能集的rfecv。

获得RFECV中的功能scikit-learn

1 个答案: