我正在尝试使用R中Caret包中的各种预测算法来解决回归问题,即我的目标变量是连续的。 Caret认为分类是问题的合适类别,当我通过任何回归模型时,我收到一条错误消息,上面写着“错误的分类模型类型”。为了重现性,让我们看看Combined Cycle Power Plant Data Set。数据在CCPP.zip中。让我们将功率预测为其他变量的函数。权力是一个连续变量。
library(readxl)
library(caret)
power_plant = read_excel("Folds5x2_pp.xlsx")
apply(power_plant,2, class) # shows all columns are numeric
control <- trainControl(method="repeatedcv", number=10, repeats=5)
my_glm <- train(power_plant[,1:4], power_plant[,5],
method = "lm",
preProc = c("center", "scale"),
trControl = control)
下面的图片是我的截图:
答案 0 :(得分:2)
由于某些原因,caret
被tibbles弄糊涂了,这是read_excel
返回的数据帧的tidyverse变体。通过将其转换为简单的数据框,然后将其转换为插入符号,一切正常:
library(readxl)
library(caret)
power_plant = read_excel("Folds5x2_pp.xlsx")
apply(power_plant,2, class) # shows all columns are numeric
power_plant <- data.frame(power_plant)
control <- trainControl(method="repeatedcv", number=10, repeats=5)
my_glm <- train(power_plant[,1:4], power_plant[,5],
method = "lm",
preProc = c("center", "scale"),
trControl = control)
my_glm
得到以下特性:
Linear Regression
9568 samples
4 predictor
Pre-processing: centered (4), scaled (4)
Resampling: Cross-Validated (10 fold, repeated 5 times)
Summary of sample sizes: 8612, 8612, 8611, 8612, 8612, 8610, ...
Resampling results:
RMSE Rsquared
4.556703 0.9287933
Tuning parameter 'intercept' was held constant at a value of TRUE
答案 1 :(得分:0)
当我尝试使用公式 = y ~ x 时,我遇到了类似的错误,只需省略命名变量并使用 y ~ x 即可。