Question

我试图估计我手头的分类任务的特征重要性。对我来说重要的是获得代表每个特征重要性的特定数字，而不仅仅是“选择最重要的X特征”。

明显的选择是使用基于树的方法，它提供了很好的feature_importances_方法来获得每个特征的重要性。但我对树基分类器的结果不满意。我了解到SelectFromModel方法能够根据重要性得分消除不重要的特征，并且也能成功地为SVM或线性模型做到这一点。

我想知道，有没有办法从SelectFromModel获取每个功能的特定重要性分数，而不仅仅是获取最重要的功能列表？

Answer 1

通过GitHub source code，我找到了这段代码：

def _get_feature_importances(estimator):
    """Retrieve or aggregate feature importances from estimator"""
    importances = getattr(estimator, "feature_importances_", None)

    if importances is None and hasattr(estimator, "coef_"):
        if estimator.coef_.ndim == 1:
            importances = np.abs(estimator.coef_)

        else:
            importances = np.sum(np.abs(estimator.coef_), axis=0)

    elif importances is None:
        raise ValueError(
            "The underlying estimator %s has no `coef_` or "
            "`feature_importances_` attribute. Either pass a fitted estimator"
            " to SelectFromModel or call fit before calling transform."
            % estimator.__class__.__name__)

    return importances

因此，如果您使用线性模型，则代码只是使用模型系数作为＆＃34;重要性分数＆＃34;。

您可以通过从传递给coef_的估算工具中提取SelectFromModel属性来实现此目的。

示例：

sfm = SelectFromModel(LassoCV(), 0.25)
sfm.fit(X, y)
print(sfm.estimator_.coef_)  # print "importance" scores

Scikit-learn SelectFromModel - 实际获取基础预测变量的特征重要性分数

1 个答案: