我正在R中构建一个线性模型,我正在努力实现具体的计划对比。该模型涉及交互术语。我相信我知道如何正确指定glht()
所采用的矩阵,以便为对比提供估计。但是,我不确定如何在矩阵中指定交互计划对比。这是一个可重复的例子:
library(multcomp)
# Create minimal dataset
set.seed(100)
df <- data.frame(response = rnorm(2000),
age = sample(15:60, 2000, replace = TRUE),
sex = rep(c("Male", "Female"), each = 1000),
ses = rep(c("L", "LM", "MH", "H"), 500))
# Basic linear model
m1 <- lm(response ~ age + sex * ses,
data = df)
summary(m1)
Call:
lm(formula = response ~ age + sex * ses, data = df)
Residuals:
Min 1Q Median 3Q Max
-3.3423 -0.6697 0.0044 0.6716 3.2542
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.0514746 0.0889766 -0.579 0.563
age -0.0003435 0.0016934 -0.203 0.839
sexMale 0.0819362 0.0900868 0.910 0.363
sesL 0.0769168 0.0901245 0.853 0.394
sesLM 0.1289027 0.0901339 1.430 0.153
sesMH 0.0681291 0.0900821 0.756 0.450
sexMale:sesL -0.0661439 0.1274402 -0.519 0.604
sexMale:sesLM -0.0925924 0.1275157 -0.726 0.468
sexMale:sesMH -0.1188700 0.1273978 -0.933 0.351
Residual standard error: 1.007 on 1991 degrees of freedom
Multiple R-squared: 0.001591, Adjusted R-squared: -0.002421
F-statistic: 0.3966 on 8 and 1991 DF, p-value: 0.9229
# Create a matrix of linear combinations
# Here 'H' and 'Female' are the reference categories,
lc <- rbind("Female:H" = c(1, 0, 0, 0, 0, 0, 0, 0, 0),
"Female:L" = c(1, 0, 0, 1, 0, 0, 0, 0, 0),
"Female:LM" = c(1, 0, 0, 0, 1, 0, 0, 0, 0),
"Female:MH" = c(1, 0, 0, 0, 0, 1, 0, 0, 0),
"Male vs Female difference in L vs H" = c(1, 0, 0, 0, 0, 0, 1, 0, 0),
"Male vs Female difference in LM vs H" = c(1, 0, 0, 0, 0, 0, 0, 1, 0),
"Male vs Female difference in MH vs H" = c(1, 0, 0, 0, 0, 0, 0, 0, 1))
# Create matrix for the linfct argument in glht()
# One contrast compares the LM to the L category in females
# Another contrast compares the pooled LM and L categories to the H category in females
k <- rbind("Females: LM vs. L" =
lc["Female:LM", ] - lc["Female:L", ],
"Females: Pooled L and LM vs H" =
((lc["Female:LM", ] + lc["Female:L", ]) / 2) - lc["Female:H", ])
c1 <- glht(m1, linfct = k)
s1 <- summary(c1, adjusted(type = "none"))
confint(c1, adjusted(type = "none"))
我想添加k
,对比:
修改
例如,在上面的摘要输出中,您会将sexMale:sesL
术语解释为:L和H之间的额外差异,对于男性与女性。
CrossValidated https://stats.stackexchange.com/questions/112944/design-of-matrix-of-contrasts-in-r中的这篇文章有助于我走到这一步。但是,我不确定来自lc
的{{1}}的线性组合应该在k
中减去以获得我想要的系数。
非常感谢帮助。
答案 0 :(得分:2)
很遗憾,我认为您的lc
和k
不正确。矩阵的col-number对应names(m1$coef)
的元素编号。
[1] Intercept, [2] age, [3] sexMale, [4] sesL, [5] sesLM, [6] sesMH, [7] sexMale:sesL, [8] sexMale:sesLM, [9] sexMale:sesMH
。
因此,"Female:L" = c(1, 0, 1, 0, 0, 0, 0, 0, 0)
表示"Male:H"
,"Female:LM" = c(1, 0, 0, 1, 0, 0, 0, 0, 0)
表示Female:L
。您将这些差异用作k[,1]
。请注意s1
的估计。
s1
# Linear Hypotheses:
# Estimate Std. Error t value Pr(>|t|)
# Females: LM vs. L == 0 -0.005019 0.090091 -0.056 0.956
# Females: Pooled L and LM vs H == 0 0.079427 0.078038 1.018 0.309
# In fact, you did "sexMale:H vs Female:L".
m1$coef["sesL"] - m1$coef["sexMale"] # -0.005019421
您可以通过您链接的帖子显示的方法获得正确的矩阵。
group <- paste0(df$sex, df$ses)
group <- aggregate(model.matrix(m1) ~ group, FUN=mean)
rownames(group) <- group$group
group <- group[,-1]
group$age <- 0
lc2 <- as.matrix(group)
rownames(lc2) <- c("Female:H", "Female:L", "Female:LM", "Female:MH",
"Male:H", "Male:L", "Male:LM", "Male:MH")
lc2
(Intercept) age sexMale sesL sesLM sesMH sexMale:sesL sexMale:sesLM sexMale:sesMH
Female:H 1 0 0 0 0 0 0 0 0
Female:L 1 0 0 1 0 0 0 0 0
Female:LM 1 0 0 0 1 0 0 0 0
Female:MH 1 0 0 0 0 1 0 0 0
Male:H 1 0 1 0 0 0 0 0 0
Male:L 1 0 1 1 0 0 1 0 0
Male:LM 1 0 1 0 1 0 0 1 0
Male:MH 1 0 1 0 0 1 0 0 1
k2 <- rbind("Females: LM vs. L" =
lc2["Female:LM", ] - lc2["Female:L", ],
"Females: Pooled L and LM vs H" =
((lc2["Female:LM", ] + lc2["Female:L", ]) / 2) - lc2["Female:H", ])
c1_2 <- glht(m1, linfct = k2)
s1_2 <- summary(c1_2, adjusted(type = "none"))
s1_2
# Linear Hypotheses:
# Estimate Std. Error t value Pr(>|t|)
# Females: LM vs. L == 0 0.05199 0.09008 0.577 0.564
# Females: Pooled L and LM vs H == 0 0.10291 0.07807 1.318 0.188
m1$coef["sesLM"] - m1$coef["sesL"] # 0.05198589
(m1$coef["sesL"] + m1$coef["sesLM"])/2 # 0.1029097
sexMale
的身份是相互作用的术语Male : H
,只是为了确定。
m2 <- lm(response ~ age + ses + sex:ses , data = df)
summary(m2) # excerpt
# sesH:sexMale 0.0819362 0.0900868 0.910 0.363
summary(m1) # excerpt
# sexMale 0.0819362 0.0900868 0.910 0.363
(注意:差异没有截距,因为它是一个常用术语)
k3 <- rbind(
"Male vs Female difference in L vs H" =
(lc2["Male:L",] - lc2["Male:H",]) - (lc2["Female:L",] - lc2["Female:H",]),
"Male vs Female difference in LM vs H" =
(lc2["Male:LM",] - lc2["Male:H",]) - (lc2["Female:LM",] - lc2["Female:H",]),
"Male vs Female difference in MH vs H" =
(lc2["Male:MH",] - lc2["Male:H",]) - (lc2["Female:MH",] - lc2["Female:H",]),
# above is modified lc[5:7, ], below is what you want
"Male vs Female difference in LM vs L" =
(lc2["Male:LM",] - lc2["Male:L",]) - (lc2["Female:LM",] - lc2["Female:L",]),
"Male vs Female difference in Pooled L and LM vs H" =
((lc2["Male:L", ] + lc2["Male:LM", ]) / 2 - lc2["Male:H", ])
- ((lc2["Female:L", ] + lc2["Female:LM", ]) / 2 - lc2["Female:H", ])
)
k3 # I deleted some character, space, digit because of the space and viewability
(Intercept) age sexM sesL sesLM sesMH sexM:sesL sexM:sesLM sexM:sesMH
M vs F diff in L vs H 0 0 0 0 0 0 1 0 0
M vs F diff in LM vs H 0 0 0 0 0 0 0 1 0
M vs F diff in MH vs H 0 0 0 0 0 0 0 0 1
M vs F diff in LM vs L 0 0 0 0 0 0 -1 1 0
MvsF diff in L&LM vs H 0 0 0 0 0 0 0.5 0.5 0