我正在尝试在KNeighborsClassifier上应用RFECV以消除无关紧要的功能。为了使问题可重复,以下是虹膜数据的示例:
from sklearn.datasets import load_iris
from sklearn.feature_selection import RFECV
from sklearn.neighbors import KNeighborsClassifier
iris = load_iris()
y = iris.target
X = iris.data
estimator = KNeighborsClassifier()
selector = RFECV(estimator, step=1, cv=5)
selector = selector.fit(X, y)
导致以下错误按摩:
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-27-19f0f2f0f0e7> in <module>()
7 estimator = KNeighborsClassifier()
8 selector = RFECV(estimator, step=1, cv=5)
----> 9 selector.fit(X, y)
C:...\Anaconda3\lib\site-packages\sklearn\feature_selection\rfe.py in fit(self, X, y)
422 verbose=self.verbose - 1)
423
--> 424 rfe._fit(X_train, y_train, lambda estimator, features:
425 _score(estimator, X_test[:, features], y_test, scorer))
426 scores.append(np.array(rfe.scores_[::-1]).reshape(1, -1))
C:...\Anaconda3\lib\site-packages\sklearn\feature_selection\rfe.py in _fit(self, X, y, step_score)
180 coefs = estimator.feature_importances_
181 else:
--> 182 raise RuntimeError('The classifier does not expose '
183 '"coef_" or "feature_importances_" '
184 'attributes')
RuntimeError: The classifier does not expose "coef_" or "feature_importances_" attributes
如果我将分类器更改为SVC:
from sklearn.datasets import load_iris
from sklearn.feature_selection import RFECV
from sklearn.svm import SVC
iris = load_iris()
y = iris.target
X = iris.data
estimator = SVC(kernel="linear")
selector = RFECV(estimator, step=1, cv=5)
selector = selector.fit(X, y)
它会正常工作。有关如何解决问题的任何建议?
注意:我昨天更新了Anaconda,并更新了sklearn。
答案 0 :(得分:1)
错误是非常自我解释的 - knn不提供进行特征选择的逻辑。您不能使用它(sklearn的实现)来实现这样的目标,除非您为KNN定义自己的特征重要性度量。据我所知 - 没有这样的一般性对象,所以 - scikit-learn没有实现它。另一方面,SVM与每个线性模型一样 - 提供此类信息。
答案 1 :(得分:1)
您可能会从mlxtend
库获得部分解决方案:
http://rasbt.github.io/mlxtend/user_guide/feature_selection/SequentialFeatureSelector/
请参阅https://github.com/rasbt/mlxtend
至于Scikit-learn请参阅: