我已经训练了217个变量中nvmax = 5的模型。问题在于,summary()中选择的变量与使用coef()选择的变量不同。
数据文件可以在这里找到: http://www.atlasbrasil.org.br/2013/data/rawData/atlas2013_dadosbrutos_pt.xlsx
然后我正在做
library("readxl")
library("MASS")
library("leaps")
library("caret")
dados_municipios <- read_excel("atlas2013_dadosbrutos_pt.xlsx",sheet=2,col_types="numeric")
dados_municipios_sc_2010 <- dados_municipios[dados_municipios$UF == 42 & dados_municipios$ANO == 2010,]
dados <- subset(dados_municipios_sc_2010, select=-c(ANO,UF,Codmun6,Codmun7,Município,
CORTE1,CORTE2,CORTE3,CORTE4,CORTE9,
RDPC1,RDPC10,RDPC2,RDPC3,RDPC4,RDPC5,RDPCT,
RIND,RMPOB,RPOB))
# Set up repeated k-fold cross-validation
controle <- trainControl(method="cv", number=5, repeats=5)
set.seed(100)
modelo1 <- train(RDPC~., data=dados, method="leapBackward", tuneGrid=data.frame(nvmax=5), trControl=controle)
# The differences are here:
summary(modelo1) # In the summary, it stars [*] a variable named PIA, for number of varibles equal to 5 (nv = 5)
coef(modelo1$finalModel,5) # Here, for nv = 5, PIA is not chosen
在设置nvmax = 5并使模型达到nv = 6时,还有一种奇怪的行为。这是预期的吗?