How to set different kind of bar plot in terminal nodes?

时间:2017-12-18 06:54:47

标签: r tree party

I am running a MOB tree on a dataset and I want to modify plots in terminal nodes. I am going to use bar chart of the coefficients of the models which fitted by MOB in each node as my terminal node.

For example, I run the MOB tree on "PimaIndiansDiabetes" dataset in "mlbench" package. Here is the codes:

pid_formula <- diabetes ~ glucose | pregnant + pressure + triceps +   
insulin + mass + pedigree + age
logit <- function(y, x, start = NULL, weights = NULL, offset = NULL, ...) {
glm(y ~ 0 + x, family = binomial, start = start, ...)
}
pid_tree <- mob(pid_formula, data = PimaIndiansDiabetes, fit = logit)

then I have model for each node. for example I have "mass=-9.95+0.058*glucose" for Node number 2. I want to make bar charts from these coefficients (ex: -9.95 and 0.058 for node number 2) and use these bar charts as my terminal nodes in final tree plot. Any idea how to do that? Thanks in advance.

1 个答案:

答案 0 :(得分:1)

要在partykit中实现这样的图形,您必须为plot()方法(或者更确切地说是面板生成函数)编写新的面板函数。起点可以是partykit::node_barplot,它首先提取分类树的拟合概率,然后使用grid包绘制它们。相反,您可以使用coef()提取估算的参数,然后使用grid绘制这些参数。这有点技术但不是非常复杂。

但是,我不建议实现这样的功能。原因是这最适合比较同一节点内的不同系数。但由于坡度和截距是完全不同的尺度,因此不容易解释。相反,应该更多地强调跨节点的相同系数的差异。其基础也是:

coef(pid_tree)
##   x(Intercept)   xglucose
## 2    -9.951510 0.05870786
## 4    -6.705586 0.04683748
## 5    -2.770954 0.02353582

此外,可以考虑置信区间的相应标准误差。 (请记住,这些必须采取一些盐,但:他们不会调整估计树,但假装终端组外生。仍然有用作粗糙的尺度。)我包括一个小的便利功能这样做:

confintplot <- function(object, ylim = NULL,
  xlab = "Parameter per node", ylab = "Estimate",
  main = "", index = NULL, ...)
{
  ## point estimates and interval
  cf <- coef(object)
  node <- nodeids(object, terminal = TRUE)
  ci <- nodeapply(object, ids = node, FUN = function(n)
                  confint(info_node(n)$object, ...))
  if (!is.null(index)) {
    cf <- cf[, index, drop = FALSE]
    ci <- lapply(ci, "[", index, , drop = FALSE)
  }
  cfnm <- rownames(ci[[1L]])
  nodenm <- rownames(cf)

  ## set up dimensions
  n <- length(ci)
  k <- nrow(ci[[1L]])
  at <- t(outer(1:k, seq(-0.15, 0.15, length.out = n), "+"))

  ## empty plot
  if(is.null(ylim)) ylim <- range(unlist(ci))
  plot(0, 0, type = "n", xlim = range(at), ylim = ylim,
    xlab = xlab, ylab = ylab, main = main, axes = FALSE)

  ## draw every parameter
  for(i in 1L:k) {
    arrows(at[,i], sapply(ci, "[", i, 1L), at[,i], sapply(ci, "[", i, 2L),
      code = 3, angle = 90, length = 0.05)
    points(at[, i], cf[, cfnm[i]], pch = 19, col = "white", cex=1.15)
    points(at[, i], cf[, cfnm[i]], pch = nodenm, cex = 0.65)
  }

  axis(1, at = 1:k, labels = cfnm)
  axis(2)
  box()
}

使用这个我们可以分别为每个参数(截距与斜率)创建一个图。这表明当斜率下降时,截距在节点间增加。

par(mfrow = c(1, 2))
confintplot(pid_tree, index = 1)
confintplot(pid_tree, index = 2)

confintplot1

也可以在共同的y轴上显示这些。然而,由于尺度不同,这完全掩盖了斜率的变化:

confintplot(pid_tree)

confintplot2

最终评论:我建议glmtree()使用mob()代替这种特殊模型,而不是{{1}}“手工”。前者更快,并提供一些额外的功能,尤其是简单的预测。