插入:选择自定义摘要函数的最小值

时间:2017-08-24 09:36:55

标签: r r-caret

我在插入符号中自定义了摘要功能以计算Brier分数。计算工作正常,但我没能选择最佳模型作为最低Brier分数的模型。

library(data.table)
N      <- 1000
X1     <- rnorm(N, 175, 7)
X2     <- rnorm(N,  30, 8)
X3     <- rnorm(N,0,1)
X4     <- rnorm(N,50,3)
X5     <- rnorm(N,2,1)
X6     <- rnorm(N,10,2)
X7     <- runif(N,0,1)
length   <- sample(1:5,N,T)
Ycont  <- 0.5*X1 - 0.3*X2  +0.01*X3 + 0.2*X4+0.24*X5+X6+X7*0.002 + 10 + rnorm(N, 0, 6)
Ycateg <- ntile(Ycont,3)
df     <- data.frame(id=1:N,length,X1, X2,X3,X4,X5,X6,X7, Ycateg)
df$Ycateg=ifelse(df$Ycateg==1,"current",ifelse(df$Ycateg==2,"default","prepaid"))

df=setDT(df)[,.SD[rep(1L,length)],by = id]
df=df[ , time := 1:.N , by=id]
df=df[,-c("length")]
head(df)


customSummary <- function (data, lev = NULL, model = NULL) { # for training on a next-period return
  Y_dummy = model.matrix( ~ data[, "obs"] - 1) # create dummy - for each level of the outcome
  Y_pre=as.data.frame(data[ , c("current","default","prepaid")])
  Brier=(as.numeric(Y_dummy) - Y_pre)^2 
  Brier_all=sum(Brier)
  names(Brier_all)="Brier Score"
  return(Brier_all)
}


# which type of cross validation to do
fitControl <- trainControl(method = 'cv',number=5,classProbs=TRUE,summaryFunction=customSummary, selectionFunction = "best" )
# tuning parameters
grid <- expand.grid(mtry = 1:5 )

cv=train(as.factor(Ycateg)~.,
         data = df,
         method = "ranger",
         trControl = fitControl,
         tuneGrid = grid
)

cv 

收益率:

......
 mtry  Brier Score
  1     181.02207  
  2      92.22158  
  3      85.66351  
  4      81.85301  
  5      79.73677  

Brier Score was used to select the optimal model using  the largest value.
The final value used for the model was mtry = 1.

到目前为止,我使用的是trainControlselectionFunction = "best",这肯定是不合适的。

所以我的主要观点是,如何选择具有最低Brier分数的模型?

1 个答案:

答案 0 :(得分:1)

成功的关键是在train()调用中设置maximize = FALSE,所以

cv=train(as.factor(Ycateg)~.,
         data = df,
         method = "ranger",
          maximize=FALSE,
         trControl = fitControl,
         tuneGrid = grid        # tuning parameters
)

cv
  ...
  mtry  Brier Score
  1     172.09248  
  2      86.32899  
  3      80.13424  
  4      77.16511  
  5      75.32933  

Brier Score was used to select the optimal model using  the smallest value.
The final value used for the model was mtry = 5.