Question

有时您的研究可能预测回归系数的大小可能因组而异。例如，您可能认为高度预测体重的回归系数在三个年龄组（年轻，中年，老年人）中会有所不同。下面，我们有一个数据文件，包括3个虚构的年轻人，3个虚构的中年人，3个虚构的老年人，以及他们的身高和体重。变量年龄表示年龄组，年轻人为1，中年人为2，老年人为3。

那么，如何使用R来比较三个（或更多）组的回归系数（主要是斜率）？

示例数据：

age height weight 
 1  56 140   
 1  60 155   
 1  64 143     
 2  56 117   
 2  60 125   
 2  64 133      
 3  74 245   
 3  75 241   
 3  82 269

Answer 1

确定三个年龄组的回归系数是否不同＆＃34;我们可以在R中使用anova函数。例如，使用问题中的数据并在最后的注释中重复显示：

fm1 <- lm(weight ~ height, DF)
fm3 <- lm(weight ~ age/(height - 1), DF)

给出以下2.7％的显着水平，因此我们可以得出结论，如果我们使用5％的截止值，则各组的回归系数存在差异，但如果我们使用1％的截止值则不会。

> anova(fm1, fm3)
 Analysis of Variance Table

Model 1: weight ~ height
Model 2: weight ~ age/(height - 1)
  Res.Df     RSS Df Sum of Sq      F  Pr(>F)  
1      7 2991.57                              
2      3  149.01  4    2842.6 14.307 0.02696 *
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

注1： fm3以上有6个系数，每组的截距和斜率。如果你想要4个系数，一个常见的截距和单独的斜率，那么使用

lm(weight ~ age:height, DF)

注2：我们还可以比较级别子集相同的模型。例如，我们可以将年龄1和2相同的模型与它们全部相同（fm1）和所有不同（fm3）的模型进行比较：

fm2 <- lm(weight ~ age/(height - 1), transform(DF, age = factor(c(1, 1, 3)[age])))
anova(fm1, fm2, fm3)

如果你做了大量的测试，你可能会偶然发现一些测试，所以你会想要降低p值的截止值。

注3：这里有关于lm公式的一些注释：http://science.nature.nps.gov/im/datamgmt/statistics/r/formulas/

注4：我们将此作为输入：

Lines <- "age height weight
1 56 140
1 60 155
1 64 143
2 56 117
2 60 125
2 64 133
3 74 245
3 75 241
3 82 269"
DF <- read.table(text = Lines, header = TRUE)
DF$age <- factor(DF$age)

Answer 2

CrossValidated中对此有一个很好的答案。但简短地说，

require(emmeans)
data <- data.frame(age = factor(c(1,1,1,2,2,2,3,3,3)), 
               height = c(56,60,64,56,60,64,74,75,82), 
               weight = c(140,155,142,117,125,133,245,241,269))

model <- lm(weight ~ height*age, data)
anova(model) #check the results

Analysis of Variance Table

Response: weight
           Df  Sum Sq Mean Sq  F value    Pr(>F)    
height      1 25392.3 25392.3 481.5984 0.0002071 ***
age         2  2707.4  1353.7  25.6743 0.0129688 *  
height:age  2   169.0    84.5   1.6027 0.3361518    
Residuals   3   158.2    52.7                       
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1


slopes <- emtrends(model, 'age', var = 'height') #gets each slope
slopes
age height.trend   SE df lower.CL upper.CL
1           0.25 1.28  3    -3.84     4.34
2           2.00 1.28  3    -2.09     6.09
3           3.37 1.18  3    -0.38     7.12

Confidence level used: 0.95 


pairs(slopes) #gets their comparisons two by two
contrast estimate   SE df t.ratio p.value
1 - 2       -1.75 1.82  3 -0.964  0.6441 
1 - 3       -3.12 1.74  3 -1.790  0.3125 
2 - 3       -1.37 1.74  3 -0.785  0.7363 

P value adjustment: tukey method for comparing a family of 3 estimates

如何使用R比较三个（或更多）组的回归系数？

2 个答案: