从C5.0模型中提取矩阵

时间:2017-02-08 11:26:18

标签: r

使用

对数据执行C 5.0算法后
a <- C5.0(FACTOR~.,data = i_data,trials=10,costs = matrix(c(0,1,4,0), nrow = 2))

当我使用

找到模型的摘要时
summary(a)

我得到这样的东西,

.
.
.
.

SubTree [S1]

Col_L > 89: N (195.6/6.5)
Col_L <= 89:
:...Col_Q > 4657: Y (66.6/34)
    Col_Q <= 4657:
    :...Col_F > 15: Y (117.6/75)
        Col_F <= 15:
        :...Col_C <= 5.6926: N (2040.5/266.7)
            Col_C > 5.6926: Y (148.7/104.4)

SubTree [S2]

Col_E > 14: N (2523.3/176.8)
Col_E <= 14:
:...Col_G > 5: N (83.4/1.4)
    Col_G <= 5:
    :...Col_O > 6880: Y (41.8/22)
        Col_O <= 6880:
        :...Col_G <= 3: N (1939.9/230.1)
            Col_G > 3: Y (92.7/64.5)


Evaluation on training data (53392 cases):

Trial          Decision Tree       
-----     -----------------------  
  Size      Errors   Cost  

   0        87 16173(30.3%)   0.35
   1        25 14071(26.4%)   0.43
   2        48 15295(28.6%)   0.74
   3        50 14672(27.5%)   0.48
   4        43 16765(31.4%)   0.55
   5        52 16346(30.6%)   0.98
   6        58 18277(34.2%)   0.52
   7        65 13940(26.1%)   0.64
   8        63 14020(26.3%)   0.42
   9        57 13517(25.3%)   0.45
   boost           13284(24.9%)   0.39   <<


   (a)   (b)    <-classified as
  ----  ----
 15848 10848    (a): class N
  2436 24260    (b): class Y


Attribute usage:

100.00% Col_A
100.00% Col_B
100.00% Col_C
100.00% Col_D
100.00% Col_E
 99.79% Col_F
 99.63% Col_G
 76.66% Col_H
 76.55% Col_I
 75.64% Col_J
 70.22% Col_K
 65.15% Col_L
 59.01% Col_M
 58.94% Col_N
 42.54% Col_O
 33.01% Col_P
 21.73% Col_Q
 16.58% Col_R
 12.69% Col_S
  8.43% Col_T

有没有办法提取

 (a)   (b)    <-classified as
  ----  ----
 15848 10848    (a): class N
  2436 24260    (b): class Y

来自上面的摘要,以便我可以在另一个R实例中加载它?

1 个答案:

答案 0 :(得分:1)

C5.0将其保存为文字,但您可以将其导出为:

#example from ?C5.0
data(churn)
treeModel <- C5.0(x = churnTrain[, -20], y = churnTrain$churn)
treeModel
#saves summary in b
#b$output is the printed text
b <- summary(treeModel)

#get position of '(a)'
pos1 <- gregexpr(pattern ='\\(a\\)', b$output)[[1]][1]
#get position of 'class no' - in your case should be class Y
pos2 <- gregexpr(pattern ='class no', b$output)[[1]][1]
#substring using the above
text <- substr(b$output, pos1, pos2)

#print
cat(text)

输出:

(a)   (b)    <-classified as
----  ----
365   118    (a): class yes
 18  2832    (b): c