我遇到了一个奇怪的错误
Error in `[.data.frame`(data, , lvls[1]) : undefined columns selected
当我使用插入符号训练glmnet模型时的消息。我为序数模型使用了基本上相同的代码和相同的预测变量(然后使用y
的不同因素),并且运行良好。计算花费了400个核心小时,因此我无法在此处显示)。
#Source a small subset of data
source("https://gist.githubusercontent.com/FredrikKarlssonSpeech/ebd9fccf1de6789a3f529cafc496a90c/raw/efc130e41c7d01d972d1c69e59bf8f5f5fea58fa/voice.R")
trainIndex <- createDataPartition(notna$RC, p = .75,
list = FALSE,
times = 1)
training <- notna[ trainIndex[,1],] %>%
select(RC,FCoM_envel:ATrPS_freq,`Jitter->F0_abs_dif`:RPDE)
testing <- notna[-trainIndex[,1],] %>%
select(RC,FCoM_envel:ATrPS_freq,`Jitter->F0_abs_dif`:RPDE)
fitControl <- trainControl(## 10-fold CV
method = "CV",
number = 10,
allowParallel=TRUE,
savePredictions="final",
summaryFunction=twoClassSummary)
vtCVFit <- train(x=training[-1],y=training[,"RC"],
method = "glmnet",
trControl = fitControl,
preProcess=c("center", "scale"),
metric="Kappa"
)
我找不到任何明显错误的数据。没有NA
table(is.na(training))
FALSE
43166
,不知道为什么它会尝试在列数之外进行索引。
有什么建议吗?
答案 0 :(得分:5)
您必须在trainControl()中删除summaryFunction = twoClassSummary。它对我有用。
fitControl <- trainControl(## 10-fold CV
method = "CV",
number = 10,
allowParallel=TRUE,
savePredictions="final")
vtCVFit <- train(x=training[-1],y=training[,"RC"],
method = "glmnet",
trControl = fitControl,
preProcess=c("center", "scale"),
metric="Kappa")
print(vtCVFit)
#glmnet
#113 samples
#381 predictors
# 2 classes: 'NVT', 'VT'
#Pre-processing: centered (381), scaled (381)
#Resampling: Bootstrapped (25 reps)
#Summary of sample sizes: 113, 113, 113, 113, 113, 113, ...
#Resampling results across tuning parameters:
# alpha lambda Accuracy Kappa
# 0.10 0.01113752 0.5778182 0.1428393
# 0.10 0.03521993 0.5778182 0.1428393
# 0.10 0.11137520 0.5778182 0.1428393
# 0.55 0.01113752 0.5778182 0.1428393
# 0.55 0.03521993 0.5748248 0.1407333
# 0.55 0.11137520 0.5749980 0.1136131
# 1.00 0.01113752 0.5815391 0.1531280
# 1.00 0.03521993 0.5800217 0.1361240
# 1.00 0.11137520 0.5939621 0.1158007
#Kappa was used to select the optimal model using the largest value.
#The final values used for the model were alpha = 1 and lambda = 0.01113752.
答案 1 :(得分:2)
通过以下代码将您的因素更改为字符,并查看其是否有效:
training <- data.frame(lapply(training , as.character), stringsAsFactors=FALSE)
我会把这个建议留为评论,但我却做不到(因为我的声誉少于50!)