R中的虚拟变量

时间:2017-03-02 17:13:31

标签: r dummy-variable

Ciao Everyone,

我想在R中创建一个虚拟变量。所以我有一个意大利语区域列表,以及一个名为mafia的变量。黑手党变量在具有高水平黑手党渗透的区域中编码为1,在黑手党渗透水平较低的区域中编码为0。

现在,我想创建一个仅考虑具有高级别黑手党的区域的假人。 (= 1)

1 个答案:

答案 0 :(得分:1)

如果我正确理解你的问题,添加虚拟变量(也称为固定效果)的典型方法是使用函数factor。这是一个创建随机数据然后在线性回归中使用factor的示例:

set.seed(1)
require(data.table)
A = data.table(region = LETTERS[0:3], y = runif(100), x = runif(100), mafia = sample(c(0,1),100,rep = T))
> head(A)
   region        var mafia
1:      A 0.67371223     1
2:      B 0.09485786     0
3:      C 0.49259612     1
4:      A 0.46155184     1
5:      B 0.37521653     1
6:      C 0.99109922     1

formula = y ~ x + factor(mafia)

reg <- lm(formula, data = A)

> summary(reg)

Call:
lm(formula = formula, data = A)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.46965 -0.24828 -0.03362  0.28780  0.51183 

Coefficients:
               Estimate Std. Error t value Pr(>|t|)    
(Intercept)     0.46196    0.07093   6.513 3.28e-09 ***
x               0.06735    0.10521   0.640    0.524    
factor(mafia)1 -0.01830    0.06415  -0.285    0.776    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.3189 on 97 degrees of freedom
Multiple R-squared:  0.005498,  Adjusted R-squared:  -0.01501 
F-statistic: 0.2681 on 2 and 97 DF,  p-value: 0.7654

如果您只希望对“黑手党”专栏中使用1编码的观察结果进行回归,则更容易:

# Note that A is a data.table
A.mafia = A[ mafia == 1 ]
formula = y ~ x
reg <- lm(formula, data = A.mafia)
summary(reg)

输出:

Call:
    lm(formula = formula, data = A.mafia)

    Residuals:
         Min       1Q   Median       3Q      Max 
    -0.47163 -0.26063 -0.05724  0.30166  0.52062 

    Coefficients:
                Estimate Std. Error t value Pr(>|t|)    
    (Intercept)  0.43334    0.07926   5.467 1.53e-06 ***
    x            0.09017    0.14474   0.623    0.536    
    ---
    Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

    Residual standard error: 0.3197 on 49 degrees of freedom
    Multiple R-squared:  0.007857,  Adjusted R-squared:  -0.01239 
    F-statistic: 0.388 on 1 and 49 DF,  p-value: 0.5362