我第一次使用Rpart但却无法理解一些事情:
str(X1)
' data.frame':7329 obs。 4个变量: $ totalordered:num 34 800 36 254 564 ... $ number_of_orders:int 1 4 1 2 4 1 2 1 1 1 ... $ custom_field_classification:因子w / 5级"教育:教育者/学生",..:2 2 1 2 1 1 1 2 4 3 ... $ CLTV:int 0 363 0 5 114 0 119 0 0 0 ...
>model1 <- rpart(custom_field_classification ~.,data=X1,method="class")
>model1
n = 7329 node),split,n,loss,yval,(yprob) *表示终端节点
1) root 7329 4591 Education: Educator/Student (0.37 0.24 0.12 0.23 0.043) 2) totalordered< 104.715 4043 1898 Education: Educator/Student (0.53 0.15 0.092 0.19 0.038) * 3) totalordered>=104.715 3286 2151 Education: K-12 (0.18 0.35 0.15 0.27 0.051) *
>summary(model1)
Call:
rpart(formula = custom_field_classification ~ ., data = X1, method = "class")
n= 7329
CP nsplit rel error xerror xstd
1 0.1180571 0 1.0000000 1.0000000 0.009020710
2 0.0100000 1 0.8819429 0.8832498 0.009270542
Variable importance
totalordered number_of_orders CLTV
54 26 20
Node number 1: 7329 observations, complexity param=0.1180571
predicted class=Education: Educator/Student expected loss=0.6264156 P(node) =1
class counts: 2738 1724 866 1683 318
probabilities: 0.374 0.235 0.118 0.230 0.043
left son=2 (4043 obs) right son=3 (3286 obs)
Primary splits:
totalordered < 104.715 to the left, improve=312.20560, (0 missing)
number_of_orders < 2.5 to the left, improve= 93.21119, (0 missing)
CLTV < 0.5 to the left, improve= 63.99499, (0 missing)
Surrogate splits:
number_of_orders < 1.5 to the left, agree=0.773, adj=0.493, (0 split)
CLTV < 0.5 to the left, agree=0.718, adj=0.370, (0 split)
Node number 2: 4043 observations
predicted class=Education: Educator/Student expected loss=0.4694534 P(node) =0.5516442
class counts: 2145 589 370 787 152
probabilities: 0.531 0.146 0.092 0.195 0.038
Node number 3: 3286 observations
predicted class=Education: K-12 expected loss=0.6545953 P(node) =0.4483558
class counts: 593 1135 496 896 166
probabilities: 0.180 0.345 0.151 0.273 0.051