从预测变量中进行选择,并进行同样的改进

时间:2018-11-27 12:08:54

标签: r decision-tree rpart

我尝试实现RPART,以便以后进行一些开发。到目前为止,仅适用于回归(ANOVA)模型。除了一件事之外,一切看起来都非常干净-RPART如何在具有相同改进的多个预测变量之间选择最佳划分。

例如,对于初始拆分,我有三个预测变量,它们给出相同的结果(相同的改进,相同的拆分,彼此的完美替代)—例如X310X312X317 。 RPART默认选择X312,但它不是列序列中的第一个预测变量。如果我置换列,RPART将选择X312X317,但不会选择X310。

以下是选择X312时的摘要示例:

Node number 1: 100 observations, complexity param=0.7123717
mean=0.5155042, MSE=0.08350028
left son=2 (47 obs) right son=3 (53 obs)
Primary splits:
      X312 < 0.03673   to the left,  improve=0.7123717, (0 missing)
      X317 < 0.0187715 to the left,  improve=0.7123717, (0 missing)
      X310 < 0.0440585 to the left,  improve=0.7123717, (0 missing)
      X318 < 0.0167545 to the left,  improve=0.7123435, (0 missing)
      X323 < 0.0101715 to the left,  improve=0.7092180, (0 missing)

当它选择X317时:

Node number 1: 100 observations,    complexity param=0.7123717
  mean=0.5155042, MSE=0.08350028
  left son=2 (47 obs) right son=3 (53 obs)
  Primary splits:
      X317 < 0.0187715 to the left,  improve=0.7123717, (0 missing)
      X312 < 0.03673   to the left,  improve=0.7123717, (0 missing)
      X310 < 0.0440585 to the left,  improve=0.7123717, (0 missing)
      X318 < 0.0167545 to the left,  improve=0.7123435, (0 missing)
      X323 < 0.0101715 to the left,  improve=0.7092180, (0 missing)

再一次,一切都是相同的。我试图查看RPART的C代码,但找不到任何其他检查。对于任何想法都会非常感谢。

0 个答案:

没有答案