好吧,我有两个问题 - 也许是相关的 - 与假人和因素有关。我将使用一个与我的数据库非常相似的示例。我有20个专栏,有几个名字,例如,一个国家的总统(例如," George W。"," Bill C。"等)。另外,我有25列策略(例如" str_1"," str2"等)。它们都在同一个数据库中,例如" dat",以及y和x等其他变量。
例如
=============================
y x presidents strategies
============================
20 2 Bill.C 3_A
10 1 George.W 2_B
10 1 Tom_C 3_C
3 2 Tom_C 2_D
4 4 John.C 3_A
4 3 Bill.C 2_A
我想为总统和傻瓜的y~x + dummies回归策略+总统和策略之间的互动。
我已经为20位总统和25位策略中的每一位创建了假人,但我不知道如何创建每位总裁和每种策略之间的互动(这是我问题的第一部分) )。假设我可以轻松地做到这一点,有没有其他方法可以指定我的回归而不必逐个编写20 * 25个交互(我知道Stata对同样的问题有一些命令)?
也许这些是单独的问题,但我不确定。
提前致谢。
答案 0 :(得分:0)
lm
和glm
会自动将因子变量转换为相应的虚拟变量(将一个变量作为参考类别)。因此,只需执行以下操作即可:
mod1 = lm(y ~ x + presidents + strategies + presidents:strategies, data = df1)
mod2 = lm(y ~ x + presidents*strategies, data = df1)
mod3 = glm(y ~ x + presidents + strategies + presidents:strategies, data = df1)
mod4 = glm(y ~ x + presidents*strategies, data = df1)
summary(mod1)
summary(mod2)
summary(mod3)
summary(mod4)
<强>结果:强>
> summary(mod1)
Call:
lm(formula = y ~ x + presidents + strategies + presidents:strategies,
data = df1)
Residuals:
Min 1Q Median 3Q Max
-17.3690 -6.1273 -0.1699 6.4295 17.4156
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 14.4782 3.0799 4.701 5.15e-06 ***
x -0.1692 0.2141 -0.790 0.431
presidentsGeorge.W 11.1984 8.8283 1.268 0.206
presidentsJohn.C 4.1281 4.2305 0.976 0.330
presidentsTom_C 4.9604 3.6271 1.368 0.173
strategies2_B 1.6203 3.5736 0.453 0.651
strategies2_D -1.7246 3.6550 -0.472 0.638
strategies3_A 1.7663 3.2966 0.536 0.593
strategies3_C -0.5787 3.8440 -0.151 0.881
presidentsGeorge.W:strategies2_B -9.9934 10.0125 -0.998 0.320
presidentsJohn.C:strategies2_B -1.5192 5.8696 -0.259 0.796
presidentsTom_C:strategies2_B -0.8962 5.0202 -0.179 0.859
presidentsGeorge.W:strategies2_D -7.5266 9.7414 -0.773 0.441
presidentsJohn.C:strategies2_D 1.7179 6.4375 0.267 0.790
presidentsTom_C:strategies2_D -1.1020 5.0551 -0.218 0.828
presidentsGeorge.W:strategies3_A -11.9783 9.3115 -1.286 0.200
presidentsJohn.C:strategies3_A -2.8849 5.0866 -0.567 0.571
presidentsTom_C:strategies3_A -5.0305 4.4068 -1.142 0.255
presidentsGeorge.W:strategies3_C -6.5116 9.7387 -0.669 0.505
presidentsJohn.C:strategies3_C -4.3792 6.0389 -0.725 0.469
presidentsTom_C:strategies3_C -1.3257 5.3821 -0.246 0.806
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 8.364 on 179 degrees of freedom
Multiple R-squared: 0.064, Adjusted R-squared: -0.04058
F-statistic: 0.612 on 20 and 179 DF, p-value: 0.9007
> summary(mod2)
Call:
lm(formula = y ~ x + presidents * strategies, data = df1)
Residuals:
Min 1Q Median 3Q Max
-17.3690 -6.1273 -0.1699 6.4295 17.4156
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 14.4782 3.0799 4.701 5.15e-06 ***
x -0.1692 0.2141 -0.790 0.431
presidentsGeorge.W 11.1984 8.8283 1.268 0.206
presidentsJohn.C 4.1281 4.2305 0.976 0.330
presidentsTom_C 4.9604 3.6271 1.368 0.173
strategies2_B 1.6203 3.5736 0.453 0.651
strategies2_D -1.7246 3.6550 -0.472 0.638
strategies3_A 1.7663 3.2966 0.536 0.593
strategies3_C -0.5787 3.8440 -0.151 0.881
presidentsGeorge.W:strategies2_B -9.9934 10.0125 -0.998 0.320
presidentsJohn.C:strategies2_B -1.5192 5.8696 -0.259 0.796
presidentsTom_C:strategies2_B -0.8962 5.0202 -0.179 0.859
presidentsGeorge.W:strategies2_D -7.5266 9.7414 -0.773 0.441
presidentsJohn.C:strategies2_D 1.7179 6.4375 0.267 0.790
presidentsTom_C:strategies2_D -1.1020 5.0551 -0.218 0.828
presidentsGeorge.W:strategies3_A -11.9783 9.3115 -1.286 0.200
presidentsJohn.C:strategies3_A -2.8849 5.0866 -0.567 0.571
presidentsTom_C:strategies3_A -5.0305 4.4068 -1.142 0.255
presidentsGeorge.W:strategies3_C -6.5116 9.7387 -0.669 0.505
presidentsJohn.C:strategies3_C -4.3792 6.0389 -0.725 0.469
presidentsTom_C:strategies3_C -1.3257 5.3821 -0.246 0.806
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 8.364 on 179 degrees of freedom
Multiple R-squared: 0.064, Adjusted R-squared: -0.04058
F-statistic: 0.612 on 20 and 179 DF, p-value: 0.9007
> summary(mod3)
Call:
glm(formula = y ~ x + presidents + strategies + presidents:strategies,
data = df1)
Deviance Residuals:
Min 1Q Median 3Q Max
-17.3690 -6.1273 -0.1699 6.4295 17.4156
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 14.4782 3.0799 4.701 5.15e-06 ***
x -0.1692 0.2141 -0.790 0.431
presidentsGeorge.W 11.1984 8.8283 1.268 0.206
presidentsJohn.C 4.1281 4.2305 0.976 0.330
presidentsTom_C 4.9604 3.6271 1.368 0.173
strategies2_B 1.6203 3.5736 0.453 0.651
strategies2_D -1.7246 3.6550 -0.472 0.638
strategies3_A 1.7663 3.2966 0.536 0.593
strategies3_C -0.5787 3.8440 -0.151 0.881
presidentsGeorge.W:strategies2_B -9.9934 10.0125 -0.998 0.320
presidentsJohn.C:strategies2_B -1.5192 5.8696 -0.259 0.796
presidentsTom_C:strategies2_B -0.8962 5.0202 -0.179 0.859
presidentsGeorge.W:strategies2_D -7.5266 9.7414 -0.773 0.441
presidentsJohn.C:strategies2_D 1.7179 6.4375 0.267 0.790
presidentsTom_C:strategies2_D -1.1020 5.0551 -0.218 0.828
presidentsGeorge.W:strategies3_A -11.9783 9.3115 -1.286 0.200
presidentsJohn.C:strategies3_A -2.8849 5.0866 -0.567 0.571
presidentsTom_C:strategies3_A -5.0305 4.4068 -1.142 0.255
presidentsGeorge.W:strategies3_C -6.5116 9.7387 -0.669 0.505
presidentsJohn.C:strategies3_C -4.3792 6.0389 -0.725 0.469
presidentsTom_C:strategies3_C -1.3257 5.3821 -0.246 0.806
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for gaussian family taken to be 69.96038)
Null deviance: 13379 on 199 degrees of freedom
Residual deviance: 12523 on 179 degrees of freedom
AIC: 1439
Number of Fisher Scoring iterations: 2
> summary(mod4)
Call:
glm(formula = y ~ x + presidents * strategies, data = df1)
Deviance Residuals:
Min 1Q Median 3Q Max
-17.3690 -6.1273 -0.1699 6.4295 17.4156
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 14.4782 3.0799 4.701 5.15e-06 ***
x -0.1692 0.2141 -0.790 0.431
presidentsGeorge.W 11.1984 8.8283 1.268 0.206
presidentsJohn.C 4.1281 4.2305 0.976 0.330
presidentsTom_C 4.9604 3.6271 1.368 0.173
strategies2_B 1.6203 3.5736 0.453 0.651
strategies2_D -1.7246 3.6550 -0.472 0.638
strategies3_A 1.7663 3.2966 0.536 0.593
strategies3_C -0.5787 3.8440 -0.151 0.881
presidentsGeorge.W:strategies2_B -9.9934 10.0125 -0.998 0.320
presidentsJohn.C:strategies2_B -1.5192 5.8696 -0.259 0.796
presidentsTom_C:strategies2_B -0.8962 5.0202 -0.179 0.859
presidentsGeorge.W:strategies2_D -7.5266 9.7414 -0.773 0.441
presidentsJohn.C:strategies2_D 1.7179 6.4375 0.267 0.790
presidentsTom_C:strategies2_D -1.1020 5.0551 -0.218 0.828
presidentsGeorge.W:strategies3_A -11.9783 9.3115 -1.286 0.200
presidentsJohn.C:strategies3_A -2.8849 5.0866 -0.567 0.571
presidentsTom_C:strategies3_A -5.0305 4.4068 -1.142 0.255
presidentsGeorge.W:strategies3_C -6.5116 9.7387 -0.669 0.505
presidentsJohn.C:strategies3_C -4.3792 6.0389 -0.725 0.469
presidentsTom_C:strategies3_C -1.3257 5.3821 -0.246 0.806
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for gaussian family taken to be 69.96038)
Null deviance: 13379 on 199 degrees of freedom
Residual deviance: 12523 on 179 degrees of freedom
AIC: 1439
Number of Fisher Scoring iterations: 2
如您所见,估计值完全相同。
数据:强>
df = read.table(text = "y x presidents strategies
20 2 Bill.C 3_A
10 1 George.W 2_B
10 1 Tom_C 3_C
3 2 Tom_C 2_D
4 4 John.C 3_A
4 3 Bill.C 2_A", header = TRUE)
set.seed(123)
df1 = data.frame(y = sample(1:30, 200, replace = TRUE),
x = sample(1:10, 200, replace = TRUE),
presidents = sample(df$presidents, 200, replace = TRUE),
strategies = sample(df$strategies, 200, replace = TRUE))