列名与fit regsubsets不同,以选择最佳变量

时间:2016-08-03 16:33:11

标签: r machine-learning

我试图从regsubsets获取重要性的变量(列名)。我喜欢逐个获取我可以分析的重要变量。这是程序

library(leaps)
library(ISLR)
data(Hitters)
reg_fit=regsubsets(Salary~., data = Hitters, nvmax = 10, method = "forward")

问题是reg_fit中的列名与data-Hitters中的列名不同。

以下是原始数据的输出:

names(Hitters)
##  [1] "AtBat"     "Hits"      "HmRun"     "Runs"      "RBI"      
##  [6] "Walks"     "Years"     "CAtBat"    "CHits"     "CHmRun"   
## [11] "CRuns"     "CRBI"      "CWalks"    "League"    "Division" 
## [16] "PutOuts"   "Assists"   "Errors"    "Salary"    "NewLeague"

以下是从reg_fit中提取的输出:

colnames(summary(reg_fit)$which)
##  [1] "(Intercept)" "AtBat"       "Hits"        "HmRun"       "Runs"       
##  [6] "RBI"         "Walks"       "Years"       "CAtBat"      "CHits"      
## [11] "CHmRun"      "CRuns"       "CRBI"        "CWalks"      "LeagueN"    
## [16] "DivisionW"   "PutOuts"     "Assists"     "Errors"      "NewLeagueN"

注意Legaue更改为LeagueN,Division更改为DivisionW。任何想法,如果这是一个错误或有一个简单的方法从reg_fit获取列名?

1 个答案:

答案 0 :(得分:1)

这不是一个错误。它将分类变量分解为指标变量,以便它们可以用于回归,名称更改是如何让您知道哪个级别分配给指标的正级别。

如果您想避免这种情况,可以通过预处理来实现。以下是变量Button clickButton = (Button) findViewById(R.id.btn1); if (clickButton != null) { clickButton.setOnClickListener( new View.OnClickListener() { @Override public void onClick(View v) { /***Do what you want with the click here***/ } }); } 的示例:

League

在上面的示例中,我创建了一个数字变量,当League <- rep(0,322) League[Hitters$League == "N"] <- 1 Hitters$League <- as.numeric(as.character(League)) reg_fit=regsubsets(Salary~., data = Hitters, nvmax = 10, method = "forward") colnames(summary(reg_fit)$which) 等于League时,它等于1,并用它来替换{​​{1}}的{​​{1}}变量版本。< / p>

对于二元因子变量,您可以在运行回归后更改结果对象中的标签,但是如果您有超过2个级别,则无效。对于多类因子变量,您需要在原始数据集中创建多个指标变量,就像我在上面的示例中所做的那样。