哪个线性模型汇总行对应于公式中的哪个术语?

时间:2015-04-18 14:08:54

标签: r formula

线性模型的摘要使用某些字符串来表示其输出中的系数,例如:

summary(lm(
 target ~ some.bool + some.factor + some.factor*some.value +
          some.factor:some.other,
 data.frame(target=rnorm(100), some.bool=sample(c(T, F), 100, T),
  some.factor=sample(c('Y', 'N', 'M'), 100, T), some.value=rnorm(100),
  some.other=rnorm(100))))

会产生一个名称为: some.boolTRUEsome.factorNsome.factorYsome.valuesome.factorN:some.valuesome.factorY:some.valuesome.factorM:some.othersome.factorN:some.othersome.factorY:some.other

如何以编程方式找出表中哪些行对应输入公式的哪些项?我想得到一些映射,例如:

`some.boolTRUE`            → some.bool
`some.factorN`:            → some.factor, some.factor*some.value
`some.factorY`:            → some.factor, some.factor*some.value
`some.value`:              → some.factor*some.value
`some.factorN:some.value`: → some.factor*some.value
`some.factorN:some.other`: → some.factor:some.other

我的目标是为结果准备一种特定的演示形式,其中线性回归的数据按输入术语分组。

1 个答案:

答案 0 :(得分:0)

所以,我注意到生成这些名称的代码深入model.matrix函数内部,称为外部C函数。我可以使用像这样的黑客恢复由术语构建的名称(term是从公式本身取出的表达式/符号对象):

names.for.term <- function(term, data, order.as.in=term) {
  # construct a simple formula that has only the requested term
  f <- formula(substitute(~ x, list(x=term)))

  # make a terms object for manipulation
  term.terms <- terms(f, data=data)

  # what order do we want to consider variables in?
  requested.order <- na.omit(match(
    row.names(attr(terms(order.as.in), 'factors')),
    row.names(attr(term.terms, 'factors'))))

  # force the order of variables (setting row.names is enough;
  # values in this array are not important for the process of building
  # strings if you have only a single summand. if not, good luck)
  row.names(attr(term.terms, 'factors')) <-
    row.names(attr(term.terms, 'factors'))[requested.order]

  # we need model frame object to have columns in the same order as
  # rows above; types of variables (e.g. factors) are inferred from here
  m <- model.frame(f, data)[requested.order]

  # call deep into C code
  dimnames(.External2(stats:::C_modelmatrix, term.terms, m))[[2]][-1]
}

丑陋,但有效。由于字符串取决于此函数调用在术语中遇到的变量的顺序,因此您可能希望将完整的公式传递为order.as.in。现在唯一剩下的就是反转映射,这在这一点上是微不足道的。