我找不到rpart对象中节点的分割值(或其他数据)。我看到的是 summary(sample_model),但不在列表或数据框中
一些样本数据
foo.df <- structure(list(type = c("fudai", "fudai", "fudai", "fudai", "fudai",
"fudai", "fudai", "tozama", "fudai", "fudai", "tozama", "tozama",
"fudai", "tozama", "fudai", "fudai", "tozama", "fudai", "fudai",
"tozama", "fudai", "fudai", "fudai", "tozama", "fudai", "fudai",
"tozama", "fudai", "fudai", "fudai", "fudai", "fudai", "tozama",
"fudai", "fudai", "fudai", "fudai", "fudai", "fudai", "tozama",
"tozama", "fudai", "tozama", "tozama", "tozama", "tozama", "fudai",
"fudai", "tozama", "tozama"), distance = c(12.5366985071383,
272.697138147139, 40.4780423740381, 109.806349869662, 147.781805212839,
89.4280438527415, 49.1425850803745, 555.414271440522, 119.365138867582,
182.902536555383, 310.019126513348, 277.122207392514, 214.510428881317,
235.111617874157, 104.494518693549, 50.7561853895564, 343.308898045237,
151.796857505073, 36.0391449169937, 30.8214406651022, 343.294467363406,
135.841501028422, 154.798119311647, 317.739208576563, 3.33794280697559,
98.9182898110913, 422.915369767251, 194.957988642709, 87.6548263591412,
187.571370158631, 236.292608259126, 17.915709270268, 193.548578374405,
262.190146422316, 21.6219797945323, 121.199009527283, 261.670997612517,
202.2051991431, 125.418459536787, 275.964068539003, 190.112226847932,
20.1753302760961, 488.80323504215, 579.25515722891, 233.500797034697,
207.588349435329, 183.770003408524, 168.739293254246, 313.140075747773,
131.69228390613), age = c(1756, 1711, 1712, 1746, 1868, 1866,
1682, 1617, 1771, 1764, 1672, 1636, 1864, 1704, 1762, 1868, 1694,
1749, 1703, 1616, 1691, 1702, 1723, 1683, 1742, 1691, 1623, 1721,
1704, 1745, 1749, 1723, 1639, 1661, 1843, 1845, 1669, 1698, 1698,
1664, 1868, 1633, 1783, 1642, 1615, 1648, 1734, 1758, 1725, 1635
)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-50L))
还有一个基本模型
library("rpart")
sample_model <- rpart(formula = type ~ .,
data = sample_data,
method = "class",
control = rpart.control(xval = 50, minbucket = 5, cp = 0.05),
parms = list(split = "gini"))
rpart文档说,在 sample_model $ frame 中应该有一个称为“ splits”的列,但并不存在。引用:“拆分,每个节点的左右拆分标签两列矩阵” https://www.rdocumentation.org/packages/rpart/versions/4.1-15/topics/rpart.object
sample_model $ frame 或 sample_model 中的那些列在哪里?但是,我在
中看到了想要的数据summary(sample_model)
这是怎么回事?
答案 0 :(得分:1)
我看到的比现在要多了,但是似乎并不能描述当前的结构。 $splits
项目是一个单独的列表元素:
sample_model$splits
#----------
count ncat improve index adj
distance 50 -1 9.134639 274.3306 0
age 50 1 7.910588 1687.0000 0
age 39 1 6.062937 1654.5000 0
distance 39 -1 1.950142 188.8418 0
要查看sample_model的完整结构,请执行以下操作:
str(sample_model)
我无法证实我对文档滞后于代码的直觉:
news(grepl('splits', Text), 'rpart') #--------------------
版本4.1-0中的更改
现在,仅当代理拆分发送两次或更多且权重非零的案例 时才考虑。对于数字/有序变量,新的非零权重限制:对于分类变量,这是新的限制。 仅通过舍入误差超过默认拆分的替代拆分将不再返回。如果存在权重和缺失值,则其中一些的拆分分量未正确返回。版本4.0-1中的更改
另一个主要变化是由用户查询提示的非对称损耗矩阵错误。如果L =不对称损失,则更改后的先验计算不正确-他们使用L'而不是L。Upshot-树不一定为给定的损失矩阵选择最佳分割。选择后,将正确评估拆分。印刷的“改进”值当然也是错误的。有趣的是,对于我的小测试用例,由于L非常不对称,所以树中的早期分割未更改-好的分割看起来仍然不错。
要获得规范的答案,您需要联系维护者:
maintainer('rpart')