Ciao Everyone,
我想在R中创建一个虚拟变量。所以我有一个意大利语区域列表,以及一个名为mafia的变量。黑手党变量在具有高水平黑手党渗透的区域中编码为1,在黑手党渗透水平较低的区域中编码为0。
现在,我想创建一个仅考虑具有高级别黑手党的区域的假人。 (= 1)
答案 0 :(得分:1)
如果我正确理解你的问题,添加虚拟变量(也称为固定效果)的典型方法是使用函数factor
。这是一个创建随机数据然后在线性回归中使用factor
的示例:
set.seed(1)
require(data.table)
A = data.table(region = LETTERS[0:3], y = runif(100), x = runif(100), mafia = sample(c(0,1),100,rep = T))
> head(A)
region var mafia
1: A 0.67371223 1
2: B 0.09485786 0
3: C 0.49259612 1
4: A 0.46155184 1
5: B 0.37521653 1
6: C 0.99109922 1
formula = y ~ x + factor(mafia)
reg <- lm(formula, data = A)
> summary(reg)
Call:
lm(formula = formula, data = A)
Residuals:
Min 1Q Median 3Q Max
-0.46965 -0.24828 -0.03362 0.28780 0.51183
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.46196 0.07093 6.513 3.28e-09 ***
x 0.06735 0.10521 0.640 0.524
factor(mafia)1 -0.01830 0.06415 -0.285 0.776
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.3189 on 97 degrees of freedom
Multiple R-squared: 0.005498, Adjusted R-squared: -0.01501
F-statistic: 0.2681 on 2 and 97 DF, p-value: 0.7654
如果您只希望对“黑手党”专栏中使用1编码的观察结果进行回归,则更容易:
# Note that A is a data.table
A.mafia = A[ mafia == 1 ]
formula = y ~ x
reg <- lm(formula, data = A.mafia)
summary(reg)
输出:
Call:
lm(formula = formula, data = A.mafia)
Residuals:
Min 1Q Median 3Q Max
-0.47163 -0.26063 -0.05724 0.30166 0.52062
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.43334 0.07926 5.467 1.53e-06 ***
x 0.09017 0.14474 0.623 0.536
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.3197 on 49 degrees of freedom
Multiple R-squared: 0.007857, Adjusted R-squared: -0.01239
F-statistic: 0.388 on 1 and 49 DF, p-value: 0.5362