我在插入符号中自定义了摘要功能以计算Brier分数。计算工作正常,但我没能选择最佳模型作为最低Brier分数的模型。
library(data.table)
N <- 1000
X1 <- rnorm(N, 175, 7)
X2 <- rnorm(N, 30, 8)
X3 <- rnorm(N,0,1)
X4 <- rnorm(N,50,3)
X5 <- rnorm(N,2,1)
X6 <- rnorm(N,10,2)
X7 <- runif(N,0,1)
length <- sample(1:5,N,T)
Ycont <- 0.5*X1 - 0.3*X2 +0.01*X3 + 0.2*X4+0.24*X5+X6+X7*0.002 + 10 + rnorm(N, 0, 6)
Ycateg <- ntile(Ycont,3)
df <- data.frame(id=1:N,length,X1, X2,X3,X4,X5,X6,X7, Ycateg)
df$Ycateg=ifelse(df$Ycateg==1,"current",ifelse(df$Ycateg==2,"default","prepaid"))
df=setDT(df)[,.SD[rep(1L,length)],by = id]
df=df[ , time := 1:.N , by=id]
df=df[,-c("length")]
head(df)
customSummary <- function (data, lev = NULL, model = NULL) { # for training on a next-period return
Y_dummy = model.matrix( ~ data[, "obs"] - 1) # create dummy - for each level of the outcome
Y_pre=as.data.frame(data[ , c("current","default","prepaid")])
Brier=(as.numeric(Y_dummy) - Y_pre)^2
Brier_all=sum(Brier)
names(Brier_all)="Brier Score"
return(Brier_all)
}
# which type of cross validation to do
fitControl <- trainControl(method = 'cv',number=5,classProbs=TRUE,summaryFunction=customSummary, selectionFunction = "best" )
# tuning parameters
grid <- expand.grid(mtry = 1:5 )
cv=train(as.factor(Ycateg)~.,
data = df,
method = "ranger",
trControl = fitControl,
tuneGrid = grid
)
cv
收益率:
......
mtry Brier Score
1 181.02207
2 92.22158
3 85.66351
4 81.85301
5 79.73677
Brier Score was used to select the optimal model using the largest value.
The final value used for the model was mtry = 1.
到目前为止,我使用的是trainControl
和selectionFunction = "best"
,这肯定是不合适的。
所以我的主要观点是,如何选择具有最低Brier分数的模型?
答案 0 :(得分:1)
成功的关键是在train()调用中设置maximize = FALSE,所以
cv=train(as.factor(Ycateg)~.,
data = df,
method = "ranger",
maximize=FALSE,
trControl = fitControl,
tuneGrid = grid # tuning parameters
)
cv
...
mtry Brier Score
1 172.09248
2 86.32899
3 80.13424
4 77.16511
5 75.32933
Brier Score was used to select the optimal model using the smallest value.
The final value used for the model was mtry = 5.