重现C5.0试验的推进

时间:2017-09-13 15:46:58

标签: r machine-learning decision-tree boosting

我使用带有R的C50包,需要导出模型进行生产。

我使用提升选项,我知道试验是加权的,但我的输出中未指定权重。

我没有使用重量选项进行错误分类,我只需要试验权重。

有没有办法通过R知道我的c50模型的每个试验的重量?

1 个答案:

答案 0 :(得分:0)

> fit <- C5.0(credit[,-24], credit[,24])
> summary(fit)

Call:
C5.0.default(x = credit[, -24], y = credit[, 24])


C5.0 [Release 2.07 GPL Edition]     Thu Nov 23 09:36:14 2017
-------------------------------

Class specified by attribute `outcome'

Read 30000 cases (24 attributes) from undefined.data

Decision tree:

PAY_0 > 1:
:...EDUCATION > 3: 0 (29/7)
:   EDUCATION <= 3:
:   :...PAY_3 <= -1: 0 (187/86)
:       PAY_3 > -1: 1 (2914/830)
PAY_0 <= 1:
:...PAY_2 <= 1: 0 (24599/3514)
    PAY_2 > 1:
    :...PAY_6 <= 0: 0 (1625/605)
        PAY_6 > 0:
        :...PAY_6 > 2: 1 (58/21)
            PAY_6 <= 2:
            :...PAY_5 <= 0: 0 (132/52)
                PAY_5 > 0:
                :...SEX <= 1: 1 (215/82)
                    SEX > 1:
                    :...PAY_3 <= 1: 1 (40/13)
                        PAY_3 > 1: 0 (201/91)


Evaluation on training data (30000 cases):

        Decision Tree   
      ----------------  
      Size      Errors  

        10 5301(17.7%)   <<


       (a)   (b)    <-classified as
      ----  ----
     22418   946    (a): class 0
      4355  2281    (b): class 1


    Attribute usage:

    100.00% PAY_0
     89.57% PAY_2
     11.14% PAY_3
     10.43% EDUCATION
      7.57% PAY_6
      1.96% PAY_5
      1.52% SEX


Time: 2.5 secs

所有使用的变量的权重可以通过

找到
> C5imp(fit, metric = "splits")
           Overall
    PAY_3     22.22222
PAY_6     22.22222
EDUCATION 11.11111
PAY_0     11.11111
PAY_2     11.11111
PAY_5     11.11111
SEX       11.11111
LIMIT_BAL  0.00000
MARRIAGE   0.00000
AGE        0.00000
PAY_4      0.00000
BILL_AMT1  0.00000
BILL_AMT2  0.00000
BILL_AMT3  0.00000
BILL_AMT4  0.00000
BILL_AMT5  0.00000
BILL_AMT6  0.00000
PAY_AMT1   0.00000
PAY_AMT2   0.00000
PAY_AMT3   0.00000
PAY_AMT4   0.00000
PAY_AMT5   0.00000
PAY_AMT6   0.00000