决策树和误差矩阵计算

时间:2017-12-21 00:36:13

标签: r tree decision-tree roc confusion-matrix

我使用 rpart 和以下代码创建了一个决策树:

res.tree <- rpart(myformula, data = credit_train)

我的数据已分为两部分。培训部分为70%,测试部分为30%。

这部分效果很好,我的树就被创建了。我陷入困境的地方是预测,以便我可以计算混淆矩阵 ROC曲线

我正在使用此code tree_pred = predict(res.tree, credit_train, type = "class")

但是我收到了这条消息:

Error in predict.rpart(res.tree, credit_test, type = "class") : Invalid prediction for "rpart" object

另外:

警告讯息:

 'newdata' had 271 rows but variables found have 729 rows

我无法弄清楚我是否没有加载库或导致它无法识别类型的原因,这就是许多资源说我需要使用的内容以及为什么我会遇到不匹配的问题行。

271行的'newdata'是我的测试数据集,我的训练数据集有729行。

决策树创建是否导致了我的问题,还是预测代码?

回应评论:  我正在使用以下库:

library(readxl)
library(dplyr)
library(factoextra)
library(corrplot)
library(rpart)
library(rpart.plot)
library(RColorBrewer)
library(pROC)
library(Hmisc)
library(fBasics)
library(rattle)
library(caret)

我的数据样本:

structure(list(CHK_ACCT = c(0, 1, 0, 0), DURATION = c(6, 48, 
42, 24), HISTORY = c(4, 2, 2, 3), NEW_CAR = c(0, 0, 0, 1), USED_CAR = c(0, 
0, 0, 0), FURNITURE = c(0, 0, 1, 0), `RADIO/TV` = c(1, 1, 0, 
0), EDUCATION = c(0, 0, 0, 0), RETRAINING = c(0, 0, 0, 0), AMOUNT = c(1169, 
5951, 7882, 4870), SAV_ACCT = c(4, 0, 0, 0), EMPLOYMENT = c(4, 
2, 3, 2), INSTALL_RATE = c(4, 2, 2, 3), MALE_DIV = c(0, 0, 0, 
0), MALE_SINGLE = c(1, 0, 1, 1), MALE_MAR_or_WID = c(0, 0, 0, 
0), `CO-APPLICANT` = c(0, 0, 0, 0), GUARANTOR = c(0, 0, 1, 0), 
PRESENT_RESIDENT = c(4, 2, 4, 4), REAL_ESTATE = c(1, 1, 0, 
0), PROP_UNKN_NONE = c(0, 0, 0, 1), AGE = c(67, 22, 45, 53
), OTHER_INSTALL = c(0, 0, 0, 0), RENT = c(0, 0, 0, 0), OWN_RES = c(1, 
1, 0, 0), NUM_CREDITS = c(2, 1, 1, 2), JOB = c(2, 2, 2, 2
), NUM_DEPENDENTS = c(1, 1, 2, 2), TELEPHONE = c(1, 0, 0, 
0), FOREIGN = c(0, 0, 0, 0), DEFAULT = c(0, 1, 0, 1), CHK_ACCT_rec = c(1, 
2, 1, 1), SAV_ACCT_rec = c(0, 1, 1, 1)), .Names = c("CHK_ACCT", 
"DURATION", "HISTORY", "NEW_CAR", "USED_CAR", "FURNITURE", "RADIO/TV", 
"EDUCATION", "RETRAINING", "AMOUNT", "SAV_ACCT", "EMPLOYMENT", 
"INSTALL_RATE", "MALE_DIV", "MALE_SINGLE", "MALE_MAR_or_WID", 
"CO-APPLICANT", "GUARANTOR", "PRESENT_RESIDENT", "REAL_ESTATE", 
"PROP_UNKN_NONE", "AGE", "OTHER_INSTALL", "RENT", "OWN_RES", 
"NUM_CREDITS", "JOB", "NUM_DEPENDENTS", "TELEPHONE", "FOREIGN", 
"DEFAULT", "CHK_ACCT_rec", "SAV_ACCT_rec"), row.names = c(NA, 
-4L), class = c("tbl_df", "tbl", "data.frame"))


myformula = credit_train$DEFAULT ~ credit_train$CHK_ACCT_rec + 
credit_train$DURATION + credit_train$HISTORY + credit_train$NEW_CAR + 
credit_train$USED_CAR + credit_train$FURNITURE + credit_train$`RADIO/TV` + 
credit_train$EDUCATION + credit_train$RETRAINING + credit_train$AMOUNT + 
credit_train$SAV_ACCT_rec + credit_train$EMPLOYMENT + 
credit_train$INSTALL_RATE + credit_train$MALE_DIV + credit_train$MALE_SINGLE 
+ credit_train$MALE_MAR_or_WID + credit_train$`CO-APPLICANT` + 
credit_train$GUARANTOR + credit_train$PRESENT_RESIDENT + 
credit_train$REAL_ESTATE + credit_train$PROP_UNKN_NONE + credit_train$AGE +  
credit_train$OTHER_INSTALL + credit_train$RENT + credit_train$OWN_RES + 
credit_train$NUM_CREDITS + credit_train$JOB + credit_train$NUM_DEPENDENTS + 
credit_train$TELEPHONE + credit_train$FOREIGN

@calimo我希望这就是你所需要的。

0 个答案:

没有答案