线性模型的摘要使用某些字符串来表示其输出中的系数,例如:
summary(lm(
target ~ some.bool + some.factor + some.factor*some.value +
some.factor:some.other,
data.frame(target=rnorm(100), some.bool=sample(c(T, F), 100, T),
some.factor=sample(c('Y', 'N', 'M'), 100, T), some.value=rnorm(100),
some.other=rnorm(100))))
会产生一个名称为:
some.boolTRUE
,
some.factorN
,
some.factorY
,
some.value
,
some.factorN:some.value
,
some.factorY:some.value
,
some.factorM:some.other
,
some.factorN:some.other
,
some.factorY:some.other
。
如何以编程方式找出表中哪些行对应输入公式的哪些项?我想得到一些映射,例如:
`some.boolTRUE` → some.bool
`some.factorN`: → some.factor, some.factor*some.value
`some.factorY`: → some.factor, some.factor*some.value
`some.value`: → some.factor*some.value
`some.factorN:some.value`: → some.factor*some.value
`some.factorN:some.other`: → some.factor:some.other
我的目标是为结果准备一种特定的演示形式,其中线性回归的数据按输入术语分组。
答案 0 :(得分:0)
所以,我注意到生成这些名称的代码深入model.matrix
函数内部,称为外部C函数。我可以使用像这样的黑客恢复由术语构建的名称(term
是从公式本身取出的表达式/符号对象):
names.for.term <- function(term, data, order.as.in=term) {
# construct a simple formula that has only the requested term
f <- formula(substitute(~ x, list(x=term)))
# make a terms object for manipulation
term.terms <- terms(f, data=data)
# what order do we want to consider variables in?
requested.order <- na.omit(match(
row.names(attr(terms(order.as.in), 'factors')),
row.names(attr(term.terms, 'factors'))))
# force the order of variables (setting row.names is enough;
# values in this array are not important for the process of building
# strings if you have only a single summand. if not, good luck)
row.names(attr(term.terms, 'factors')) <-
row.names(attr(term.terms, 'factors'))[requested.order]
# we need model frame object to have columns in the same order as
# rows above; types of variables (e.g. factors) are inferred from here
m <- model.frame(f, data)[requested.order]
# call deep into C code
dimnames(.External2(stats:::C_modelmatrix, term.terms, m))[[2]][-1]
}
丑陋,但有效。由于字符串取决于此函数调用在术语中遇到的变量的顺序,因此您可能希望将完整的公式传递为order.as.in
。现在唯一剩下的就是反转映射,这在这一点上是微不足道的。