R回归中的因素和假人

时间:2017-10-17 16:21:34

标签: r regression interaction

好吧,我有两个问题 - 也许是相关的 - 与假人和因素有关。我将使用一个与我的数据库非常相似的示例。我有20个专栏,有几个名字,例如,一个国家的总统(例如," George W。"," Bill C。"等)。另外,我有25列策略(例如" str_1"," str2"等)。它们都在同一个数据库中,例如" dat",以及y和x等其他变量。

例如

=============================
y  x  presidents  strategies
============================
20 2   Bill.C      3_A
10 1   George.W    2_B
10 1   Tom_C       3_C
3  2   Tom_C       2_D
4  4   John.C      3_A
4  3   Bill.C      2_A

我想为总统和傻瓜的y~x + dummies回归策略+总统和策略之间的互动。

我已经为20位总统和25位策略中的每一位创建了假人,但我不知道如何创建每位总裁和每种策略之间的互动(这是我问题的第一部分) )。假设我可以轻松地做到这一点,有没有其他方法可以指定我的回归而不必逐个编写20 * 25个交互(我知道Stata对同样的问题有一些命令)?

也许这些是单独的问题,但我不确定。

提前致谢。

1 个答案:

答案 0 :(得分:0)

lmglm会自动将因子变量转换为相应的虚拟变量(将一个变量作为参考类别)。因此,只需执行以下操作即可:

mod1 = lm(y ~ x + presidents + strategies + presidents:strategies, data = df1)
mod2 = lm(y ~ x + presidents*strategies, data = df1)
mod3 = glm(y ~ x + presidents + strategies + presidents:strategies, data = df1)
mod4 = glm(y ~ x + presidents*strategies, data = df1)

summary(mod1)
summary(mod2)
summary(mod3)
summary(mod4)

<强>结果:

> summary(mod1)

Call:
lm(formula = y ~ x + presidents + strategies + presidents:strategies, 
    data = df1)

Residuals:
     Min       1Q   Median       3Q      Max 
-17.3690  -6.1273  -0.1699   6.4295  17.4156 

Coefficients:
                                 Estimate Std. Error t value Pr(>|t|)    
(Intercept)                       14.4782     3.0799   4.701 5.15e-06 ***
x                                 -0.1692     0.2141  -0.790    0.431    
presidentsGeorge.W                11.1984     8.8283   1.268    0.206    
presidentsJohn.C                   4.1281     4.2305   0.976    0.330    
presidentsTom_C                    4.9604     3.6271   1.368    0.173    
strategies2_B                      1.6203     3.5736   0.453    0.651    
strategies2_D                     -1.7246     3.6550  -0.472    0.638    
strategies3_A                      1.7663     3.2966   0.536    0.593    
strategies3_C                     -0.5787     3.8440  -0.151    0.881    
presidentsGeorge.W:strategies2_B  -9.9934    10.0125  -0.998    0.320    
presidentsJohn.C:strategies2_B    -1.5192     5.8696  -0.259    0.796    
presidentsTom_C:strategies2_B     -0.8962     5.0202  -0.179    0.859    
presidentsGeorge.W:strategies2_D  -7.5266     9.7414  -0.773    0.441    
presidentsJohn.C:strategies2_D     1.7179     6.4375   0.267    0.790    
presidentsTom_C:strategies2_D     -1.1020     5.0551  -0.218    0.828    
presidentsGeorge.W:strategies3_A -11.9783     9.3115  -1.286    0.200    
presidentsJohn.C:strategies3_A    -2.8849     5.0866  -0.567    0.571    
presidentsTom_C:strategies3_A     -5.0305     4.4068  -1.142    0.255    
presidentsGeorge.W:strategies3_C  -6.5116     9.7387  -0.669    0.505    
presidentsJohn.C:strategies3_C    -4.3792     6.0389  -0.725    0.469    
presidentsTom_C:strategies3_C     -1.3257     5.3821  -0.246    0.806    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 8.364 on 179 degrees of freedom
Multiple R-squared:  0.064, Adjusted R-squared:  -0.04058 
F-statistic: 0.612 on 20 and 179 DF,  p-value: 0.9007

> summary(mod2)

Call:
lm(formula = y ~ x + presidents * strategies, data = df1)

Residuals:
     Min       1Q   Median       3Q      Max 
-17.3690  -6.1273  -0.1699   6.4295  17.4156 

Coefficients:
                                 Estimate Std. Error t value Pr(>|t|)    
(Intercept)                       14.4782     3.0799   4.701 5.15e-06 ***
x                                 -0.1692     0.2141  -0.790    0.431    
presidentsGeorge.W                11.1984     8.8283   1.268    0.206    
presidentsJohn.C                   4.1281     4.2305   0.976    0.330    
presidentsTom_C                    4.9604     3.6271   1.368    0.173    
strategies2_B                      1.6203     3.5736   0.453    0.651    
strategies2_D                     -1.7246     3.6550  -0.472    0.638    
strategies3_A                      1.7663     3.2966   0.536    0.593    
strategies3_C                     -0.5787     3.8440  -0.151    0.881    
presidentsGeorge.W:strategies2_B  -9.9934    10.0125  -0.998    0.320    
presidentsJohn.C:strategies2_B    -1.5192     5.8696  -0.259    0.796    
presidentsTom_C:strategies2_B     -0.8962     5.0202  -0.179    0.859    
presidentsGeorge.W:strategies2_D  -7.5266     9.7414  -0.773    0.441    
presidentsJohn.C:strategies2_D     1.7179     6.4375   0.267    0.790    
presidentsTom_C:strategies2_D     -1.1020     5.0551  -0.218    0.828    
presidentsGeorge.W:strategies3_A -11.9783     9.3115  -1.286    0.200    
presidentsJohn.C:strategies3_A    -2.8849     5.0866  -0.567    0.571    
presidentsTom_C:strategies3_A     -5.0305     4.4068  -1.142    0.255    
presidentsGeorge.W:strategies3_C  -6.5116     9.7387  -0.669    0.505    
presidentsJohn.C:strategies3_C    -4.3792     6.0389  -0.725    0.469    
presidentsTom_C:strategies3_C     -1.3257     5.3821  -0.246    0.806    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 8.364 on 179 degrees of freedom
Multiple R-squared:  0.064, Adjusted R-squared:  -0.04058 
F-statistic: 0.612 on 20 and 179 DF,  p-value: 0.9007

> summary(mod3)

Call:
glm(formula = y ~ x + presidents + strategies + presidents:strategies, 
    data = df1)

Deviance Residuals: 
     Min        1Q    Median        3Q       Max  
-17.3690   -6.1273   -0.1699    6.4295   17.4156  

Coefficients:
                                 Estimate Std. Error t value Pr(>|t|)    
(Intercept)                       14.4782     3.0799   4.701 5.15e-06 ***
x                                 -0.1692     0.2141  -0.790    0.431    
presidentsGeorge.W                11.1984     8.8283   1.268    0.206    
presidentsJohn.C                   4.1281     4.2305   0.976    0.330    
presidentsTom_C                    4.9604     3.6271   1.368    0.173    
strategies2_B                      1.6203     3.5736   0.453    0.651    
strategies2_D                     -1.7246     3.6550  -0.472    0.638    
strategies3_A                      1.7663     3.2966   0.536    0.593    
strategies3_C                     -0.5787     3.8440  -0.151    0.881    
presidentsGeorge.W:strategies2_B  -9.9934    10.0125  -0.998    0.320    
presidentsJohn.C:strategies2_B    -1.5192     5.8696  -0.259    0.796    
presidentsTom_C:strategies2_B     -0.8962     5.0202  -0.179    0.859    
presidentsGeorge.W:strategies2_D  -7.5266     9.7414  -0.773    0.441    
presidentsJohn.C:strategies2_D     1.7179     6.4375   0.267    0.790    
presidentsTom_C:strategies2_D     -1.1020     5.0551  -0.218    0.828    
presidentsGeorge.W:strategies3_A -11.9783     9.3115  -1.286    0.200    
presidentsJohn.C:strategies3_A    -2.8849     5.0866  -0.567    0.571    
presidentsTom_C:strategies3_A     -5.0305     4.4068  -1.142    0.255    
presidentsGeorge.W:strategies3_C  -6.5116     9.7387  -0.669    0.505    
presidentsJohn.C:strategies3_C    -4.3792     6.0389  -0.725    0.469    
presidentsTom_C:strategies3_C     -1.3257     5.3821  -0.246    0.806    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for gaussian family taken to be 69.96038)

    Null deviance: 13379  on 199  degrees of freedom
Residual deviance: 12523  on 179  degrees of freedom
AIC: 1439

Number of Fisher Scoring iterations: 2

> summary(mod4)

Call:
glm(formula = y ~ x + presidents * strategies, data = df1)

Deviance Residuals: 
     Min        1Q    Median        3Q       Max  
-17.3690   -6.1273   -0.1699    6.4295   17.4156  

Coefficients:
                                 Estimate Std. Error t value Pr(>|t|)    
(Intercept)                       14.4782     3.0799   4.701 5.15e-06 ***
x                                 -0.1692     0.2141  -0.790    0.431    
presidentsGeorge.W                11.1984     8.8283   1.268    0.206    
presidentsJohn.C                   4.1281     4.2305   0.976    0.330    
presidentsTom_C                    4.9604     3.6271   1.368    0.173    
strategies2_B                      1.6203     3.5736   0.453    0.651    
strategies2_D                     -1.7246     3.6550  -0.472    0.638    
strategies3_A                      1.7663     3.2966   0.536    0.593    
strategies3_C                     -0.5787     3.8440  -0.151    0.881    
presidentsGeorge.W:strategies2_B  -9.9934    10.0125  -0.998    0.320    
presidentsJohn.C:strategies2_B    -1.5192     5.8696  -0.259    0.796    
presidentsTom_C:strategies2_B     -0.8962     5.0202  -0.179    0.859    
presidentsGeorge.W:strategies2_D  -7.5266     9.7414  -0.773    0.441    
presidentsJohn.C:strategies2_D     1.7179     6.4375   0.267    0.790    
presidentsTom_C:strategies2_D     -1.1020     5.0551  -0.218    0.828    
presidentsGeorge.W:strategies3_A -11.9783     9.3115  -1.286    0.200    
presidentsJohn.C:strategies3_A    -2.8849     5.0866  -0.567    0.571    
presidentsTom_C:strategies3_A     -5.0305     4.4068  -1.142    0.255    
presidentsGeorge.W:strategies3_C  -6.5116     9.7387  -0.669    0.505    
presidentsJohn.C:strategies3_C    -4.3792     6.0389  -0.725    0.469    
presidentsTom_C:strategies3_C     -1.3257     5.3821  -0.246    0.806    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for gaussian family taken to be 69.96038)

    Null deviance: 13379  on 199  degrees of freedom
Residual deviance: 12523  on 179  degrees of freedom
AIC: 1439

Number of Fisher Scoring iterations: 2

如您所见,估计值完全相同。

数据:

df = read.table(text = "y  x  presidents  strategies
                20 2   Bill.C      3_A
                10 1   George.W    2_B
                10 1   Tom_C       3_C
                3  2   Tom_C       2_D
                4  4   John.C      3_A
                4  3   Bill.C      2_A", header = TRUE)

set.seed(123)
df1 = data.frame(y = sample(1:30, 200, replace = TRUE),
                 x = sample(1:10, 200, replace = TRUE),
                 presidents = sample(df$presidents, 200, replace = TRUE),
                 strategies = sample(df$strategies, 200, replace = TRUE))