我目前处于死胡同。
我有一个包含分类数据的数据框,我正在尝试使用glmnet选择功能(使用R caret包)。
但是,数据框的所有行都至少包含一个NA
。
我想到的步骤如下:
### Reproducible example data frame
set.seed(123)
library(earth)
library(RANN)
library(caret)
library(tidyverse)
data(etitanic)
df <- etitanic[,-4]
df <- df[,c(2,1,3,4)]
OUTCOME <- df[,1]
x <- df[,c(2:4)]
x <- as.data.frame(lapply(x, function(cc) cc[ sample(c(TRUE, NA), prob = c(0.8, 0.20), size = length(cc), replace = TRUE) ]))
colnames(x) <- c("Predictor_1", "Predictor_2", "Predictor_3")
df <- cbind(OUTCOME, x)
df <- as.data.frame(sapply(df, as.factor))
df <- df[rowSums(is.na(df)) > 0,]
head(df)
OUTCOME Predictor_1 Predictor_2 Predictor_3
3 0 1st <NA> 1
4 0 <NA> <NA> 1
5 0 <NA> female <NA>
6 1 1st <NA> 0
7 1 1st <NA> 1
8 0 <NA> male 0
### STEP 1: convert categorical variables into dummy variables
x <- model.frame(OUTCOME ~ ., df, na.action=NULL)[,-1]
# since all rows contain at least one NA, the data frame remains unchanged
### STEP 2: Partitioning & imputing missing values
trainRowNumbers <- createDataPartition(df$OUTCOME, p=0.8, list=FALSE)
trainData <- df[trainRowNumbers,]
testData <- df[-trainRowNumbers,]
preProcess_missingdata_model <- preProcess(trainData, method='knnImpute')
# Warning in pre_process_options(method, column_types) :
# The following pre-processing methods were eliminated:
# 'knnImpute', 'center', 'scale'
trainData <- predict(preProcess_missingdata_model, newdata = trainData)
testData <- predict(preProcess_missingdata_model, testData)
### STEP 3: build the model
# Setup a grid range of lambda values
lambda <- 10^seq(-3, 3, length = 100)
# Splitting parameters of the trainData
control <- trainControl(
method="repeatedcv",
number=10,
repeats=3,
savePredictions='final',
summaryFunction=multiClassSummary
)
ridge <- train(
x,
df$OUTCOME,
method = "glmnet",
trControl = control,
tuneGrid = expand.grid(alpha = 0, lambda = lambda),
na.action = na.pass
)
# Something is wrong; all the Accuracy metric values are missing: ...
# Warnings:
# ...
# 30: model fit failed for Fold10.Rep3: alpha=0, lambda=1000 Error in (function (x, y, family = c("gaussian", "binomial", "poisson", :
unused argument (na.action = function (object, ...)
object)
#
# 31: In nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, ... :
There were missing values in resampled performance measures.
有没有办法解决这种情况?