用户在caret,logloss中定义的summaryFunction

时间:2017-11-11 17:49:10

标签: r r-caret

使用插入符包,我无法使用以下用户定义的摘要功能。它应该计算logloss,但我不断发现logloss。下面是一个可重复的例子:

data <- data.frame('target' = sample(c('Y','N'),100,replace = T), 'X1' = runif(100), 'X2' = runif(100))

log.loss2 <- function(data, lev = NULL, model = NULL) {
  logloss = -sum(data$obs*log(data$Y) + (1-data$obs)*log(1-data$Y))/length(data$obs)
  names(logloss) <- c('LL')
  logloss
}

fitControl <- trainControl(method="cv",number=1, classProbs = T, summaryFunction = log.loss2)

my.grid <- expand.grid(.decay = c(0.05), .size = c(2))

fit.nnet2 <- train(target ~., data = data,
                  method = "nnet", maxit = 500, metric = 'LL',
                  tuneGrid = my.grid, verbose = T)

2 个答案:

答案 0 :(得分:2)

错误原因是您未在训练中加入trControl = fitControl。但是,这会导致另一个错误,原因是data$obsdata$pred是因素 - 需要转换为数字,这会产生12,减去{ {1}}提供了所需的10

1

有几点需要注意:

此丢失函数仅适用于包含log.loss2 <- function(data, lev = NULL, model = NULL) { data$pred <- as.numeric(data$pred)-1 data$obs <- as.numeric(data$obs)-1 logloss = -sum(data$obs*log(data$Y) + (1-data$obs)*log(1-data$Y))/length(data$obs) names(logloss) <- c('LL') logloss } fitControl <- trainControl(method="cv",number=1, classProbs = T, summaryFunction = log.loss2) fit.nnet2 <- train(target ~., data = data, method = "nnet", maxit = 500, metric = "LL" , tuneGrid = my.grid, verbose = T, trControl = fitControl) #output Neural Network 100 samples 2 predictor 2 classes: 'N', 'Y' No pre-processing Resampling: Cross-Validated (1 fold) Summary of sample sizes: 0 Resampling results: LL 0.6931472 Tuning parameter 'size' was held constant at a value of 2 Tuning parameter 'decay' was held constant at a value of 0.05 / N作为类的数据,因为概率定义为Y,更好的方法是找到类的名称并使用该类。此外,截断data$Y以来的概率值的良好做法不是一个好主意:

log(0)

答案 1 :(得分:1)

@missuse回答了这个问题,但是我想在logloss函数中添加权重选项:

# Cross-entropy error function
LogLoss <- function(pred, true, eps = 1e-15, weights = NULL) {
  # Bound the results
  pred = pmin(pmax(pred, eps), 1 - eps)

  if (is.null(weights)) {
    return(-(sum(
      true * log(pred) + (1 - true) * log(1 - pred)
    )) / length(true))
  } else{
    return(-weighted.mean(true * log(pred) + (1 - true) * log(1 - pred), weights))
  }
}

# Caret train weighted logloss summary function
caret_logloss <- function(data, lev = NULL, model = NULL) {
  cls <- levels(data$obs) #find class names
  loss <- LogLoss(
    pred = data[, cls[2]],
    true = as.numeric(data$obs) - 1,
    weights = data$weights
  )
  names(loss) <- c('MyLogLoss')
  loss
}