从R中的rpart对象提取拆分值

时间:2019-05-19 16:25:06

标签: r decision-tree rpart

我找不到rpart对象中节点的分割值(或其他数据)。我看到的是 summary(sample_model),但不在列表或数据框中

一些样本数据

foo.df <- structure(list(type = c("fudai", "fudai", "fudai", "fudai", "fudai", 
                              "fudai", "fudai", "tozama", "fudai", "fudai", "tozama", "tozama", 
                              "fudai", "tozama", "fudai", "fudai", "tozama", "fudai", "fudai", 
                              "tozama", "fudai", "fudai", "fudai", "tozama", "fudai", "fudai", 
                              "tozama", "fudai", "fudai", "fudai", "fudai", "fudai", "tozama", 
                              "fudai", "fudai", "fudai", "fudai", "fudai", "fudai", "tozama", 
                              "tozama", "fudai", "tozama", "tozama", "tozama", "tozama", "fudai", 
                              "fudai", "tozama", "tozama"), distance = c(12.5366985071383, 
                                                                         272.697138147139, 40.4780423740381, 109.806349869662, 147.781805212839, 
                                                                         89.4280438527415, 49.1425850803745, 555.414271440522, 119.365138867582, 
                                                                         182.902536555383, 310.019126513348, 277.122207392514, 214.510428881317, 
                                                                         235.111617874157, 104.494518693549, 50.7561853895564, 343.308898045237, 
                                                                         151.796857505073, 36.0391449169937, 30.8214406651022, 343.294467363406, 
                                                                         135.841501028422, 154.798119311647, 317.739208576563, 3.33794280697559, 
                                                                         98.9182898110913, 422.915369767251, 194.957988642709, 87.6548263591412, 
                                                                         187.571370158631, 236.292608259126, 17.915709270268, 193.548578374405, 
                                                                         262.190146422316, 21.6219797945323, 121.199009527283, 261.670997612517, 
                                                                         202.2051991431, 125.418459536787, 275.964068539003, 190.112226847932, 
                                                                         20.1753302760961, 488.80323504215, 579.25515722891, 233.500797034697, 
                                                                         207.588349435329, 183.770003408524, 168.739293254246, 313.140075747773, 
                                                                         131.69228390613), age = c(1756, 1711, 1712, 1746, 1868, 1866, 
                                                                                                   1682, 1617, 1771, 1764, 1672, 1636, 1864, 1704, 1762, 1868, 1694, 
                                                                                                   1749, 1703, 1616, 1691, 1702, 1723, 1683, 1742, 1691, 1623, 1721, 
                                                                                                   1704, 1745, 1749, 1723, 1639, 1661, 1843, 1845, 1669, 1698, 1698, 
                                                                                                   1664, 1868, 1633, 1783, 1642, 1615, 1648, 1734, 1758, 1725, 1635
                                                                         )), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
                                                                                                                                     -50L))

还有一个基本模型

library("rpart")
sample_model <- rpart(formula = type ~ ., 
                  data = sample_data, 
                  method = "class",
                  control = rpart.control(xval = 50, minbucket = 5, cp = 0.05),
                  parms = list(split = "gini"))

rpart文档说,在 sample_model $ frame 中应该有一个称为“ splits”的列,但并不存在。引用:“拆分,每个节点的左右拆分标签两列矩阵” https://www.rdocumentation.org/packages/rpart/versions/4.1-15/topics/rpart.object

sample_model $ frame sample_model 中的那些列在哪里?但是,我在

中看到了想要的数据
summary(sample_model)

这是怎么回事?

1 个答案:

答案 0 :(得分:1)

我看到的比现在要多了,但是似乎并不能描述当前的结构。 $splits项目是一个单独的列表元素:

  sample_model$splits

 #----------

         count ncat  improve     index adj
distance    50   -1 9.134639  274.3306   0
age         50    1 7.910588 1687.0000   0
age         39    1 6.062937 1654.5000   0
distance    39   -1 1.950142  188.8418   0

要查看sample_model的完整结构,请执行以下操作:

str(sample_model)

我无法证实我对文档滞后于代码的直觉:

news(grepl('splits', Text), 'rpart')     #--------------------
  

版本4.1-0中的更改

     现在,仅当代理拆分发送两次或更多且权重非零的案例 时才考虑。对于数字/有序变量,新的非零权重限制:对于分类变量,这是新的限制。   仅通过舍入误差超过默认拆分的替代拆分将不再返回。如果存在权重和缺失值,则其中一些的拆分分量未正确返回。

     

版本4.0-1中的更改​​

     

另一个主要变化是由用户查询提示的非对称损耗矩阵错误。如果L =不对称损失,则更改后的先验计算不正确-他们使用L'而不是L。Upshot-树不一定为给定的损失矩阵选择最佳分割。选择后,将正确评估拆分。印刷的“改进”值当然也是错误的。有趣的是,对于我的小测试用例,由于L非常不对称,所以树中的早期分割未更改-好的分割看起来仍然不错。

要获得规范的答案,您需要联系维护者:

 maintainer('rpart')