我正在尝试计算回归树预测的RMSE,但仍然会出错。
我已经加载了一个df,删除了所有空记录并将我的df拆分为训练和验证:
income.df <- read.csv("adult.data.csv")
income.df$Workclass <- gsub("[?]", NA, as.character(income.df$Workclass))
income.df$Workclass <- as.factor(income.df$Workclass)
income.df$occupation <- gsub("[?]", NA, as.character(income.df$occupation))
income.df$occupation <- as.factor(income.df$occupation)
income.df$native.country <- gsub("[?]", NA, as.character(income.df$native.country))
income.df$native.country <- as.factor(income.df$native.country)
income.sub.df <- na.omit(income.df)
levels(income.sub.df$income) <- c(1,0)
names(income.sub.df)[15] <- "isBelow50k"
train.index2 <- sample(income.sub.df$Age, dim(income.sub.df)[1]*0.6)
train2.df <- income.sub.df[train.index2, ]
valid2.df <- income.sub.df[-train.index2, ]
创建了一个rpart树
library(rpart)
library(rpart.plot)
library(forecast)
tr.income <- rpart(isBelow50k ~ ., data = train2.df)
prp(tr.income)
predrpart <- predict(tr.income, newdata = valid2.df)
accuracy(predrpart, valid2.df$isBelow50k)
我不断得到的错误是:
Error in accuracy(predrpart, valid2.df$isBelow50k) :
First argument should be a forecast object or a time series.
我已经尝试将其转换为矢量但似乎没有什么可行的。
答案 0 :(得分:0)
您的问题是classification problem, not a regression problem。 RMSE(以及预测包中的accuracy
函数)不用于分类问题。相反,我建议使用可以从confusion matrix派生的指标。
以下是修改代码以计算简单混淆矩阵的一个选项:
tr.income <- rpart(isBelow50k ~ ., data = train2.df)
prp(tr.income)
predrpart <- factor(ifelse(predict(tr.income, newdata = valid2.df)[,2] > .5, '>50K', '<=50K'))
table(predicted = predrpart, actual = valid2.df$isBelow50k)