这是一个简单的问题,但我无法在任何地方找到明确而有说服力的答案。如果我有一个包含一个或多个交互项的回归模型,例如:
mod1 <- lm(mpg ~ factor(cyl) * factor(am), data = mtcars)
coef(summary(mod1))
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 22.900000 1.750674 13.080673 0.0000000000006057324
## factor(cyl)6 -3.775000 2.315925 -1.630018 0.1151545663620229670
## factor(cyl)8 -7.850000 1.957314 -4.010599 0.0004547582690011110
## factor(am)1 5.175000 2.052848 2.520888 0.0181760532676256310
## factor(cyl)6:factor(am)1 -3.733333 3.094784 -1.206331 0.2385525615801434851
## factor(cyl)8:factor(am)1 -4.825000 3.094784 -1.559075 0.1310692573417492068
确定哪些系数估算值适用于互动条件的确定方法是什么?显而易见的方法是grep()
为术语名称中的冒号。但是让我们假设一秒钟因为类似的事情而无法实现:
mtcars$cyl2 <- factor(mtcars$cyl, levels = c(4,6,8), labels = paste("Cyl:", unique(mtcars$cyl)))
mod2 <- lm(mpg ~ cyl2 * factor(am), data = mtcars)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 22.900000 1.750674 13.080673 0.0000000000006057324
## cyl2Cyl: 4 -3.775000 2.315925 -1.630018 0.1151545663620229670
## cyl2Cyl: 8 -7.850000 1.957314 -4.010599 0.0004547582690011110
## factor(am)1 5.175000 2.052848 2.520888 0.0181760532676256310
## cyl2Cyl: 4:factor(am)1 -3.733333 3.094784 -1.206331 0.2385525615801434851
## cyl2Cyl: 8:factor(am)1 -4.825000 3.094784 -1.559075 0.1310692573417492068
我想也许terms()
对象会有用,但事实并非如此。我也可能会对术语的排序/编号做出一些假设,以获得预期的结果:
coef(summary(mod2))[5:6,]
## Estimate Std. Error t value Pr(>|t|)
## cyl2Cyl: 4:factor(am)1 -3.733333 3.094784 -1.206331 0.2385526
## cyl2Cyl: 8:factor(am)1 -4.825000 3.094784 -1.559075 0.1310693
但我不知道如何以一般方式这样做。
可以做些什么?
答案 0 :(得分:4)
这似乎有点复杂,但我们可以只列举所有主要效果,然后采取设定差异吗?
mod2 <- lm(mpg ~ cyl2 * factor(am) + wt * disp, data = mtcars)
variables <- labels(mod2)[attr(terms(mod2), "order") == 1]
factors <- sapply(names(mod2$xlevels), function(x) paste0(x, mod2$xlevels[[x]])[-1])
setdiff(colnames(model.matrix(mod2)), c("(Intercept)", variables, unlist(factors)))
# [1] "cyl2Cyl: 4:factor(am)1" "cyl2Cyl: 8:factor(am)1" "wt:disp"
答案 1 :(得分:2)
可能的单线解决方案是:
"assign"
这很麻烦*,但基本上它使用了模型矩阵的attr(model.matrix(mod2), "assign")
## [1] 0 1 1 2 3 3
属性:
which(colSums(attr(terms(mod2), "factors")[,attr(model.matrix(mod2), "assign")[-1L]]) == 2L)
## cyl2:factor(am) cyl2:factor(am)
## 4 5
获取交互项系数的名称索引(off-by-one):
coef()
然后可用于从"assign"
中提取相关行。这是因为系数的顺序由设计矩阵中的列的顺序决定,并且其== 2L
属性将这些列映射回原始公式中的项。
它不是超级干净但似乎有用。
* PS。在该示例中,仅提取二阶交互项。将>= 2L
更改为express-react-view
或其他内容可以获得更高阶的互动。
答案 2 :(得分:2)
我构建了一些只做这个in coefplot
的东西。
#' @title matchCoefs.default
#' @description Match coefficients to predictors
#' @details Matches coefficients to predictors using information from model matrices
#' @author Jared P. Lander
#' @aliases matchCoefs.default
#' @import reshape2
#' @param model Fitted model
#' @param \dots Further arguments
#' @return a data.frame matching predictors to coefficients
matchCoefs.default <- function(model, ...)
{
# get the terms
theTerms <- model$terms
# get the assignment position
#thePos <- model$assign
thePos <- get.assign(model)
# get intercept indicator
inter <- attr(theTerms, "intercept")
# get coef names
coefNames <- names(stats::coef(model))
# get pred names
predNames <- attr(theTerms, "term.labels")
# expand out pred names to match coefficient names
predNames <- predNames[thePos]
# if there's an intercept term add it to the pred names
if(inter == 1)
{
predNames <- c("(Intercept)", predNames)
}
# build data.frame linking term to coefficient name
matching <- data.frame(Term=predNames, Coefficient=coefNames, stringsAsFactors=FALSE)
## now match individual predictor to term
# get matrix as data.frame
factorMat <- as.data.frame(attr(theTerms, "factor"))
# add column from rownames as identifier
factorMat$.Pred <- rownames(factorMat)
factorMat$.Type <- attr(theTerms, "dataClasses")
# melt it down for comparison
factorMelt <- melt(factorMat, id.vars=c(".Pred", ".Type"), variable.name="Term")
factorMelt$Term <- as.character(factorMelt$Term)
# only keep rows where there's a match
factorMelt <- factorMelt[factorMelt$value == 1, ]
# again, bring in coefficient if needed
if(inter == 1)
{
factorMelt <- rbind(data.frame(.Pred="(Intercept)", .Type="(Intercept)", Term="(Intercept)", value=1, stringsAsFactors=FALSE), factorMelt)
}
# join into the matching data.frame
matching <- join(matching, factorMelt, by="Term")
return(matching)
}
然后你可以拨打matchCoefs.default(mod1)
。
答案 3 :(得分:1)
另一种方法(也有点复杂)是通过协变量名称,看看其他协变量的名称出现了多少次:
mtcars$cyl2 <- factor(mtcars$cyl, levels = c(4,6,8), labels = paste("Cyl:", unique(mtcars$cyl)))
mod2 <- lm(mpg ~ cyl2 * factor(am), data = mtcars)
coef(summary(mod2))
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 22.900000 1.750674 13.080673 6.057324e-13
#> cyl2Cyl: 4 -3.775000 2.315925 -1.630018 1.151546e-01
#> cyl2Cyl: 8 -7.850000 1.957314 -4.010599 4.547583e-04
#> factor(am)1 5.175000 2.052848 2.520888 1.817605e-02
#> cyl2Cyl: 4:factor(am)1 -3.733333 3.094784 -1.206331 2.385526e-01
#> cyl2Cyl: 8:factor(am)1 -4.825000 3.094784 -1.559075 1.310693e-01
# Essentially an ugly double-loop
check.inter <- function(coefs){
which(lapply(coefs, function(co){
sum(unlist(lapply(coefs, function(co2){
grepl(co2, co, fixed = T)})))})>1)}
check.inter(names(coefficients(mod2)))
#> [1] 5 6
coef(summary(mod2))[check.inter(names(coefficients(mod2))),]
#> Estimate Std. Error t value Pr(>|t|)
#> cyl2Cyl: 4:factor(am)1 -3.733333 3.094784 -1.206331 0.2385526
#> cyl2Cyl: 8:factor(am)1 -4.825000 3.094784 -1.559075 0.1310693
由reprex package(v0.2.0)创建于2018-05-17。
答案 4 :(得分:1)
我会选择整洁的东西:
# Look for factor variables and replace ':' with '_'
clean_colons = function(x){
x %>% as_tibble %>%
mutate_if(is.factor, function(x){str_replace_all(x,":","_")}) %>%
return()
}
# Define the tricky cyl2 varible
mtcars$cyl2 = factor(mtcars$cyl, levels = c(4,6,8),
labels = paste0("Cyl:", unique(mtcars$cyl)))
# Create the model and output parameters
mtcars %>% clean_colons %>% lm(mpg ~ cyl2 * factor(am), data = .) %>% tidy %>%
as_tibble
# A tibble: 6 x 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 19.1 1.52 12.6 1.38e-12
2 cyl2Cyl_6 3.77 2.32 1.63 1.15e- 1
3 cyl2Cyl_8 -4.08 1.75 -2.33 2.80e- 2
4 factor(am)1 1.44 2.32 0.623 5.39e- 1
5 cyl2Cyl_6:factor(am)1 3.73 3.09 1.21 2.39e- 1
6 cyl2Cyl_8:factor(am)1 -1.09 3.28 -0.333 7.42e- 1
# Now we no longer have the colon-in-variable-name problem, so we can do
mtcars %>% clean_colons %>% lm(mpg ~ cyl2 * factor(am), data = .) %>% tidy %>%
as_tibble %>% filter(term %>% str_detect("\\w{1}:\\w{1}")) %>% pull(estimate)
[1] 3.733333 -1.091667
<强>更新强>
以上失败,如果冒号后面没有空格,可以这样解决:
Set xmlDoc = CreateObject("Msxml2.DOMDocument")
Dim nodeBook
Dim nodeId
xmlDoc.async = False
xmlDoc.Load ("xmlfile url")
If (xmlDoc.parseError.errorCode <> 0) Then
Dim myErr
Set myErr = xmlDoc.parseError
MsgBox ("You have error " & myErr.reason)
Else
Set nodeBook = xmlDoc.selectSingleNode("//Program")
Set nodeId = nodeBook.getAttributeNode("description")
wscript.Echo nodeId.value
End If
答案 5 :(得分:1)
这也很复杂,但至少你可以明确地跟踪你在做什么:
# require package
require(Hmisc)
# load and process data
data(mtcars)
mtcars$cyl <- factor(mtcars$cyl)
mtcars$gear <- factor(mtcars$gear)
# fit model
m <- lm(mpg ~ hp*wt + disp + cyl*gear, data=mtcars)
# extract all variables used in right hand side of the model
variables <- as.character(attr(terms(m), "variables"))[-1L]
# extract model coefficients
s <- summary(m)$coeff
s
# discard unwanted rows corresponding to continuous predictors
s1 <- s[rownames(s) %nin% variables[-1], ]
s1
# discard unwanted rows corresponding to categorical predictor gear
s2 <- s1[rownames(s1) %nin% paste0("gear",levels(mtcars$gear)), ]
s2
产出如下:
> s
Estimate Std. Error t value Pr(>|t|)
(Intercept) 43.80219399 5.977952455 7.3272905 4.404820e-07
hp -0.11303405 0.040769042 -2.7725462 1.174817e-02
wt -6.76725414 1.868428113 -3.6218970 1.699630e-03
disp -0.00332244 0.014012039 -0.2371133 8.149808e-01
cyl6 2.87780626 3.180169670 0.9049222 3.762789e-01
cyl8 2.76416050 3.605957805 0.7665538 4.523007e-01
gear4 3.66877217 2.529082701 1.4506335 1.623857e-01
gear5 4.25400864 2.931068582 1.4513508 1.621881e-01
hp:wt 0.02401629 0.009953734 2.4127921 2.555079e-02
cyl6:gear4 -4.65994590 3.331261825 -1.3988531 1.771739e-01
cyl6:gear5 -3.86789967 4.400309909 -0.8790062 3.898369e-01
cyl8:gear5 -2.08842224 3.980538289 -0.5246582 6.055881e-01
> s1
Estimate Std. Error t value Pr(>|t|)
(Intercept) 43.80219399 5.977952455 7.3272905 4.404820e-07
cyl6 2.87780626 3.180169670 0.9049222 3.762789e-01
cyl8 2.76416050 3.605957805 0.7665538 4.523007e-01
gear4 3.66877217 2.529082701 1.4506335 1.623857e-01
gear5 4.25400864 2.931068582 1.4513508 1.621881e-01
hp:wt 0.02401629 0.009953734 2.4127921 2.555079e-02
cyl6:gear4 -4.65994590 3.331261825 -1.3988531 1.771739e-01
cyl6:gear5 -3.86789967 4.400309909 -0.8790062 3.898369e-01
cyl8:gear5 -2.08842224 3.980538289 -0.5246582 6.055881e-01
> s2
Estimate Std. Error t value Pr(>|t|)
(Intercept) 43.80219399 5.977952455 7.3272905 4.404820e-07
cyl6 2.87780626 3.180169670 0.9049222 3.762789e-01
cyl8 2.76416050 3.605957805 0.7665538 4.523007e-01
hp:wt 0.02401629 0.009953734 2.4127921 2.555079e-02
cyl6:gear4 -4.65994590 3.331261825 -1.3988531 1.771739e-01
cyl6:gear5 -3.86789967 4.400309909 -0.8790062 3.898369e-01
cyl8:gear5 -2.08842224 3.980538289 -0.5246582 6.055881e-01
可以循环所有分类预测变量以自动执行最后两个步骤。