Question

假设我有2个数据框，一个用于2015年，一个用于2016年。我想对每个数据框运行回归，并绘制每个回归的一个系数及其各自的置信区间。例如：

set.seed(1020022316)
library(dplyr)
library(stargazer)

df16 <- data.frame(
  x1 = rnorm(1000, 0, 2),
  t = sample(c(0, 1), 1000, T),
  e = rnorm(1000, 0, 10)
) %>% mutate(y = 0.5 * x1 + 2 * t + e) %>%
  select(-e)

df15 <- data.frame(
  x1 = rnorm(1000, 0, 2),
  t = sample(c(0, 1), 1000, T),
  e = rnorm(1000, 0, 10)
) %>% mutate(y = 0.75 * x1 + 2.5 * t + e) %>%
  select(-e)

lm16 <- lm(y ~ x1 + t, data = df16)

lm15 <- lm(y ~ x1 + t, data = df15)

stargazer(lm15, lm16, type="text", style = "aer", ci = TRUE, ci.level = 0.95)

我想将t=1.558, x=2015和t=2.797, x=2016与各自的.95 CI进行对比。这样做的最佳方式是什么？

我可以'手工'，但我希望有更好的方法。

library(ggplot2)
df.plot <-
  data.frame(
    y = c(lm15$coefficients[['t']], lm16$coefficients[['t']]),
    x = c(2015, 2016),
    lb = c(
      confint(lm15, 't', level = 0.95)[1],
      confint(lm16, 't', level = 0.95)[1]
    ),
    ub = c(
      confint(lm15, 't', level = 0.95)[2],
      confint(lm16, 't', level = 0.95)[2]
    )
  )
df.plot %>% ggplot(aes(x, y)) + geom_point() +
  geom_errorbar(aes(ymin = lb, ymax = ub), width = 0.1) + 
  geom_hline(aes(yintercept=0), linetype="dashed")

最佳：数字质量（看起来不错），代码优雅，易于扩展（超过2次回归）

Answer 1

这对于评论来说有点太长了，所以我将其作为部分答案发布。

您的帖子中不清楚您的主要问题是如何将数据设置为正确的形状，或者是否是绘图本身。但只是为了跟进其中一条评论，让我告诉您如何使用libicucore.dylib和dplyr运行多个模型，这使得绘图变得简单。考虑broom - 数据集：

mtcars

您会发现这会在一个不错的数据框中为您提供所有系数和置信区间，这样可以更轻松地绘制library(dplyr) library(broom) models <- mtcars %>% group_by(cyl) %>% do(data.frame(tidy(lm(mpg ~ disp, data = .),conf.int=T ))) head(models) # I have abbreviated the following output a bit cyl term estimate std.error statistic p.value conf.low conf.high (dbl) (chr) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) 4 (Intercept) 40.8720 3.5896 11.39 0.0000012 32.752 48.99221 4 disp -0.1351 0.0332 -4.07 0.0027828 -0.210 -0.06010 6 (Intercept) 19.0820 2.9140 6.55 0.0012440 11.591 26.57264 6 disp 0.0036 0.0156 0.23 0.8259297 -0.036 0.04360。例如，如果您的数据集具有相同的内容，您可以向它们添加年份标识符（例如ggplot等），然后将它们绑定在一起（例如使用df1$year <- 2000; df2$year <- 2001，您可以使用bind_rows } bind_rows选项）。然后，您可以在上面的示例中使用年份标识而不是.id。

然后绘图很简单。要再次使用cyl数据，我们只绘制mtcars的系数（尽管您也可以使用disp，faceting等）：

grouping

使用您的数据：

 ggplot(filter(models, term=="disp"), aes(x=cyl, y=estimate)) + 
          geom_point() + geom_errorbar(aes(ymin=conf.low, ymax=conf.high))

请注意，只需将越来越多的数据绑定到主数据框，即可轻松添加越来越多的模型。如果要绘制多个系数，也可以轻松使用df <- bind_rows(df16, df15, .id = "years") models <- df %>% group_by(years) %>% do(data.frame(tidy(lm(y ~ x1+t, data = .),conf.int=T ))) %>% filter(term == "t") %>% ggplot(aes(x=years, y=estimate)) + geom_point() + geom_errorbar(aes(ymin=conf.low, ymax=conf.high))，faceting或位置 - grouping来调整相应绘图的外观。

Answer 2

这是我现在的解决方案：

gen_df_plot <- function(reg, coef_name){
  df <- data.frame(y = reg$coefficients[[coef_name]],
                   lb = confint(reg, coef_name, level = 0.95)[1],
                   ub = confint(reg, coef_name, level = 0.95)[2])
  return(df)
}

df.plot <- lapply(list(lm15,lm16), gen_df_plot, coef_name = 't')

df.plot <- data.table::rbindlist(df.plot)

df.plot$x <- as.factor(c(2015, 2016))

df.plot %>% ggplot(aes(x, y)) + geom_point(size=4) +
  geom_errorbar(aes(ymin = lb, ymax = ub), width = 0.1, linetype="dotted") + 
  geom_hline(aes(yintercept=0), linetype="dashed") + theme_bw()

我不喜欢它，但它有效。

Answer 3

这里是通用代码。我对“ x”的定义方式进行了更改，这样您就不必担心该因子的字母重新排序。

Aug_27_2019 Group Bene

用置信区间绘制回归系数

3 个答案: