我使用带有R的C50包,需要导出模型进行生产。
我使用提升选项,我知道试验是加权的,但我的输出中未指定权重。
我没有使用重量选项进行错误分类,我只需要试验权重。
有没有办法通过R知道我的c50模型的每个试验的重量?
答案 0 :(得分:0)
> fit <- C5.0(credit[,-24], credit[,24])
> summary(fit)
Call:
C5.0.default(x = credit[, -24], y = credit[, 24])
C5.0 [Release 2.07 GPL Edition] Thu Nov 23 09:36:14 2017
-------------------------------
Class specified by attribute `outcome'
Read 30000 cases (24 attributes) from undefined.data
Decision tree:
PAY_0 > 1:
:...EDUCATION > 3: 0 (29/7)
: EDUCATION <= 3:
: :...PAY_3 <= -1: 0 (187/86)
: PAY_3 > -1: 1 (2914/830)
PAY_0 <= 1:
:...PAY_2 <= 1: 0 (24599/3514)
PAY_2 > 1:
:...PAY_6 <= 0: 0 (1625/605)
PAY_6 > 0:
:...PAY_6 > 2: 1 (58/21)
PAY_6 <= 2:
:...PAY_5 <= 0: 0 (132/52)
PAY_5 > 0:
:...SEX <= 1: 1 (215/82)
SEX > 1:
:...PAY_3 <= 1: 1 (40/13)
PAY_3 > 1: 0 (201/91)
Evaluation on training data (30000 cases):
Decision Tree
----------------
Size Errors
10 5301(17.7%) <<
(a) (b) <-classified as
---- ----
22418 946 (a): class 0
4355 2281 (b): class 1
Attribute usage:
100.00% PAY_0
89.57% PAY_2
11.14% PAY_3
10.43% EDUCATION
7.57% PAY_6
1.96% PAY_5
1.52% SEX
Time: 2.5 secs
所有使用的变量的权重可以通过
找到> C5imp(fit, metric = "splits")
Overall
PAY_3 22.22222
PAY_6 22.22222
EDUCATION 11.11111
PAY_0 11.11111
PAY_2 11.11111
PAY_5 11.11111
SEX 11.11111
LIMIT_BAL 0.00000
MARRIAGE 0.00000
AGE 0.00000
PAY_4 0.00000
BILL_AMT1 0.00000
BILL_AMT2 0.00000
BILL_AMT3 0.00000
BILL_AMT4 0.00000
BILL_AMT5 0.00000
BILL_AMT6 0.00000
PAY_AMT1 0.00000
PAY_AMT2 0.00000
PAY_AMT3 0.00000
PAY_AMT4 0.00000
PAY_AMT5 0.00000
PAY_AMT6 0.00000