我正在R中使用cv.glmnet运行针对62万个观测值和21个变量的交叉验证的二进制ElasticNet回归。
A tibble: 62,905 x 13
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13
<dbl> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <fct>
1 -37.8 0 165. 269. 21.9 0.607 84.0 0 65.1 290. 4.36 8 0
2 -68.1 0 303. 168. 44.5 1.41 89.9 0 46.6 296. 0.692 34.7 0
3 -54.3 0 332. 168. 44.5 1.41 89.9 0 46.6 296. 0.692 35.8 1
4 -108. 0 338. 168. 44.5 1.41 89.9 0 46.6 296. 0.692 30.3 0
5 -60.3 0 374. 171. 35.7 2.30 88.9 0.3 51.7 295. 4.01 29.6 1
6 -82.8 0 48.2 133. 18.4 0.210 84.9 0 65.1 289. 1.35 18.7 0
7 -99.6 0 299. 219. 42.6 2.09 90.8 0 34.2 297. 1.42 7 1
8 -98.1 0 116. 153. 44.7 0.988 89.0 0 41.3 298. 0.235 32.6 0
完成cv.glment后,我预测了y
的新结果。现在,这比我的实际观察结果高出十倍。为什么最终我得到比预测的y
高10倍的结果?
这是我的代码:
set.seed(123)
library(caret)
library(tidyverse)
library(glmnet)
library(ROCR)
training.samples <- data$V1 %>% createDataPartition(p = 0.8, list = FALSE)
train <- data[training.samples, ]
test <- data[-training.samples, ]
x.train <- data.frame(train[, names(train) != "V1"])
x.train <- data.matrix(x.train)
y.train <- train$fire
x.test <- data.frame(test[, names(test) != "V1"])
x.test <- data.matrix(x.test)
y.test <- test$fire
> model <- cv.glmnet(x.train, y.train, type.measure = c("auc"), alpha = i/10, family = "binomial", parallel = TRUE)
> predicted1 <- predict(model, s = "lambda.min", newx = x.test)
> View(predicted1)
这是我的y.test
结果的两个直方图。旁注:我的1
观察中确实有100个y.test
,但还有更多的0
。
交叉验证中出了什么问题?
编辑:@ smiling4ever发表评论后,我成功了。因此,我添加了type = "response"
predicted1 <- predict(model, s = "lambda.min", newx = x.test, type = "response")