R预测包VS Stata边距

时间:2017-08-01 07:49:59

标签: r stata prediction lm

我正在从Stata切换到R,当我使用预测来计算边际pred并且Stata命令的结果边距将变量的值固定为时,我发现结果不一致X 。这是一个例子:

library(dplyr)
library(prediction)

d <- data.frame(x1 = factor(c(1,1,1,2,2,2), levels = c(1, 2)),
            x2 = factor(c(1,2,3,1,2,3), levels = c(1, 2, 3)),
            x3 = factor(c(1,2,1,2,1,2), levels = c(1, 2)),
            y = c(3.1, 2.8, 2.5, 4.3, 4.0, 3.5))

m2 <- lm(y ~ x1 + x2 + x3, d)
summary(m2)

marg2a <- prediction(m2, at = list(x2 = "1"))
marg2b <- prediction(m2, at = list(x1 = "1"))

marg2a %>%
  select(x1, fitted) %>%
  group_by(x1) %>%
  summarise(error = mean(fitted))

marg2b %>%
  select(x2, fitted) %>%
  group_by(x2) %>%
  summarise(error = mean(fitted))

结果如下:

# A tibble: 2 x 2
      x1    error
   <fctr>    <dbl>
1      1 3.133333
2      2 4.266667


# A tibble: 3 x 2
      x2 error
  <fctr> <dbl>
1      1 3.125
2      2 2.825
3      3 2.425

虽然如果我尝试使用Stata的边距复制它,结果就是这样:

regress y i.x1 i.x2 i.x3
margins i.x1, at(x2 == 1)
margins i.x2, at(x1 == 1)


------------------------------------------------------------------------------
             |            Delta-method
             |     Margin   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          x1 |
          1  |      3.125   .0829157    37.69   0.017     2.071456    4.178544
          2  |      4.275   .0829157    51.56   0.012     3.221456    5.328544
------------------------------------------------------------------------------

------------------------------------------------------------------------------
             |            Delta-method
             |     Margin   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          x2 |
          1  |      3.125   .0829157    37.69   0.017     2.071456    4.178544
          2  |      2.825   .0829157    34.07   0.019     1.771456    3.878544
          3  |      2.425   .0829157    29.25   0.022     1.371456    3.478544
------------------------------------------------------------------------------

在R和Stata中,x2的边距是相同的,但是当涉及x1时,存在差异,我不知道为什么。真的很感激任何帮助。谢谢,

P

1 个答案:

答案 0 :(得分:6)

您的Stata和R代码不相同。要复制该Stata代码,您需要:

> prediction(m2, at = list(x1 = c("1", "2"), x2 = "1"))
Average predictions for 6 observations:
 at(x1) at(x2) value
      1      1 3.125
      2      1 4.275
> prediction(m2, at = list(x2 = c("1", "2", "3"), x1 = "1"))
Average predictions for 6 observations:
 at(x2) at(x1) value
      1      1 3.125
      2      1 2.825
      3      1 2.425

这是因为,当您说margins i.x1时,您需要对数据集的反事实版本进行预测,其中x1被替换为1,然后被替换为2,同时存在两个约束x2保持为1。第二个Stata示例中也发生了同样的事情。

这是由于以下事实:Stata的margins命令具有歧义,或者说两个语法表达式获得相同的输出。一个是您的代码:

. margins i.x1, at(x2 == 1)

Predictive margins                              Number of obs     =          6
Model VCE    : OLS

Expression   : Linear prediction, predict()
at           : x2              =           1

------------------------------------------------------------------------------
             |            Delta-method
             |     Margin   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          x1 |
          1  |      3.125   .0829156    37.69   0.017     2.071457    4.178543
          2  |      4.275   .0829156    51.56   0.012     3.221457    5.328543
------------------------------------------------------------------------------

另一个更明确地说明了上面实际发生的情况:

. margins, at(x1 = (1 2) x2 == 1)

Predictive margins                              Number of obs     =          6
Model VCE    : OLS

Expression   : Linear prediction, predict()

1._at        : x1              =           1
               x2              =           1

2._at        : x1              =           2
               x2              =           1

------------------------------------------------------------------------------
             |            Delta-method
             |     Margin   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         _at |
          1  |      3.125   .0829156    37.69   0.017     2.071457    4.178543
          2  |      4.275   .0829156    51.56   0.012     3.221457    5.328543
------------------------------------------------------------------------------