scikit-learn中的目标转换和特征选择

时间:2019-09-29 13:39:22

标签: python scikit-learn cross-validation feature-selection rfe

我在scikit-learn中使用 window.onload = function() { $('ul.tabs).each(function(){ var openedHash = new URL(window.location.href).hash; links.first().removeClass('active'); content.hide(); active = $('a[href='+ openedHash + ']'); content = $($('a[href='+ openedHash + ']').attr('href')); active.addClass('active'); content.show(); }); $(this).find('a').click(function(e){ active.removeClass('active'); content.hide(); active = $(this); content = $($(this).attr('href')); active.addClass('active'); content.show(); return false; }); }); 进行功能选择。我想比较简单线性模型(RFECV)和对数转换模型(使用X,y)的结果

简单模型X, log(y)RFECV提供相同的结果(我们需要比较所有特征交叉验证的平均得分与所有功能的cross_val_score得分:RFECV = { {1}},没问题,结果可靠)

日志模型问题:似乎0.66无法提供一种转换0.66的方法。在这种情况下,得分是RFECVy。不过,这是完全可以预期的,因为我不得不手动应用0.55来拟合数据:0.53。 r2分数适用于np.log,没有log_seletor = log_selector.fit(X,np.log(y)),而我们需要的是一种将模型拟合到y = log(y)并使用inverse_func计算分数的方法。或者,如果尝试使用log(y_train),则会收到代码中显示的错误:分类器未公开“ coef_”或“ feature_importances_”属性

如何解决该问题并确保功能选择过程可靠?

exp(y_test)

输出:

TransformedTargetRegressor

2 个答案:

答案 0 :(得分:3)

您需要做的就是将这些属性添加到jr $ra

TransformedTargetRegressor

然后在您的代码中使用该代码

class MyTransformedTargetRegressor(TransformedTargetRegressor):
    @property
    def feature_importances_(self):
        return self.regressor_.feature_importances_

    @property
    def coef_(self):
        return self.regressor_.coef_

答案 1 :(得分:1)

解决此问题的一种方法是确保将coef_属性公开给功能选择模块RFECV。因此,基本上,您需要扩展TransformedTargetRegressor并确保其公开属性coef_。我创建了一个子类,其子类将从TransformedTargetRegressor开始扩展,并且还公开了coef_,如下所示。

from sklearn.linear_model import LinearRegression
from sklearn.datasets import make_friedman1
from sklearn.feature_selection import RFECV
from sklearn import linear_model
from sklearn.model_selection import cross_val_score
from sklearn.compose import TransformedTargetRegressor
import numpy as np

class myestimator(TransformedTargetRegressor):

    def __init__(self,**kwargs):
        super().__init__(regressor=LinearRegression(),func=np.log,inverse_func=np.exp)

    def fit(self, X, y, **kwargs):
        super().fit(X, y, **kwargs)  
        self.coef_ = self.regressor_.coef_
        return self

然后您可以使用myestimator来创建代码,如下所示:

X, y = make_friedman1(n_samples=50, n_features=10, random_state=0)
estimator = linear_model.LinearRegression()
log_estimator = myestimator(regressor=LinearRegression(),func=np.log,inverse_func=np.exp)

selector = RFECV(estimator, step=1, cv=5, scoring='r2')
selector = selector.fit(X, y)
log_selector = RFECV(log_estimator, step=1, cv=5, scoring='r2')
log_seletor = log_selector.fit(X,y) 

我已经运行了您的示例代码并显示了结果。

  

样品输出

print("**Simple Model**")
print("RFECV, r2 scores: ", np.round(selector.grid_scores_,2))
scores = cross_val_score(estimator, X, y, cv=5)
print("cross_val, mean r2 score: ", round(np.mean(scores),2), ", same as RFECV score with all features") 
print("no of feat: ", selector.n_features_ )

print("**Log Model**")
log_scores = cross_val_score(log_estimator, X, y, cv=5)
print("RFECV, r2 scores: ", np.round(log_selector.grid_scores_,2))
print("cross_val, mean r2 score: ", round(np.mean(log_scores),2)) 
print("no of feat: ", log_selector.n_features_ )


**Simple Model**
RFECV, r2 scores:  [0.45 0.6  0.63 0.68 0.68 0.69 0.68 0.67 0.66 0.66]
cross_val, mean r2 score:  0.66 , same as RFECV score with all features
no of feat:  6
**Log Model**
RFECV, r2 scores:  [0.41 0.51 0.59 0.59 0.58 0.56 0.54 0.53 0.55 0.55]
cross_val, mean r2 score:  0.55
no of feat:  4

希望这会有所帮助!