ggplot2:如何为geom_smooth中的预测获得稳健的置信区间?

时间:2017-07-25 21:08:23

标签: r ggplot2 regression

考虑这个简单的例子

with temptablename(program, event) as (
  select program, event
  from OMEGA.HP
  inner join POM.GT
    on program = substring(name,7,4)
  where LENGTH(name)= 25
)
select program, event
from temptablename;

这里使用默认选项计算标准错误。但是,我想使用包dataframe <- data_frame(x = c(1,2,3,4,5,6), y = c(12,24,24,34,12,15)) > dataframe # A tibble: 6 x 2 x y <dbl> <dbl> 1 1 12 2 2 24 3 3 24 4 4 34 5 5 12 6 6 15 dataframe %>% ggplot(., aes(x = x, y = y)) + geom_point() + geom_smooth(method = 'lm', formula = y~x) sandwich

中提供的健壮方差 - 协方差矩阵

即使用lmtest

有没有办法使用vcovHC(mymodel, "HC3")函数以简单的方式获取它?

enter image description here

2 个答案:

答案 0 :(得分:1)

HC健壮的SE(简单)

借助estimatr package及其lm_robust系列功能,现在可以轻松完成此操作。例如

library(tidyverse)
library(estimatr)

dataframe %>% 
  ggplot(aes(x = x, y = y)) + 
  geom_point() + 
  geom_smooth(method = 'lm_robust', formula = y~x, fill="#E41A1C") + ## Robust (HC) SEs
  geom_smooth(method = 'lm', formula = y~x) + ## Just for comparison
  theme_minimal()

enter image description here

HAC强大的SES(更多的日常工作)

一个警告是estimatr does not仍为HAC提供支持(即异方差自相关一致)SEs a la Newey-West 。但是,可以使用sandwich包手动获取这些内容(无论如何,这是原始问题的要求),然后使用geom_ribbon()进行绘制。

我将记录下来,对于这个特定的数据集来说,HAC SE并没有多大意义,但是这里有一个示例,说明如何使用this excellent这样一个相关主题的答案。

reg1 <- lm(y~x, data = dataframe)

## Generate a prediction DF
pred_df <-
  data.frame(predict(reg1, se.fit = T, interval="confidence")) %>% 
  as_tibble() 
## Clean up a little bit (optional)
colnames(pred_df) <- gsub("fit.", "", colnames(pred_df))

## Get the design matrix
X_mat <- model.matrix(reg1)

## Get HAC VCOV matrix and calculate SEs
library(sandwich)
v_hac <- NeweyWest(reg1, prewhite = F, adjust = T) ## HAC VCOV (adjusted for small data sample)
var_fit_hac <- rowSums((X_mat %*% v_hac) * X_mat)  ## Point-wise variance for predicted mean

## Add these to pred_df
pred_df <-
  pred_df %>%
  mutate(se_fit_hac = sqrt(var_fit_hac)) %>%
  mutate(
    lwr_hac = fit - qt(0.975, df=df)*se_fit_hac,
    upr_hac = fit + qt(0.975, df=df)*se_fit_hac
    )

bind_cols(
  dataframe,
  pred_df
  ) %>%
  ggplot(aes(x = x, y = y, ymin=lwr_hac, ymax=upr_hac)) + 
  geom_point() + 
  geom_ribbon(fill="#E41A1C", alpha=0.3, col=NA) + ## Robust (HAC) SEs
  geom_smooth(method = 'lm', formula = y~x) + ## Just for comparison
  theme_minimal()

enter image description here

请注意,如果您愿意,也可以使用此方法手动计算和绘制其他鲁棒的SE预测(例如HC1,HC2等)。您所需要做的就是使用相关的三明治估算器。例如,使用vcovHC(reg1, type = "HC2")代替NeweyWest(reg1, prewhite = F, adjust = T)将为您提供与第一个使用estimatr包的示例相同的HC鲁棒CI。

答案 1 :(得分:0)

我对这整个强大的SE事物都很陌生,但我能够生成以下内容:

zz = '
x y
1     1    12
2     2    24
3     3    24
4     4    34
5     5    12
6     6    15 
'

df <- read.table(text = zz, header = TRUE)
df

library(sandwich)
library(lmtest)

lm.model<-lm(y ~ x, data = df)
coef(lm.model)
se = sqrt(diag(vcovHC(lm.model, type = "HC3")))
fit = predict(lm.model)
predframe <- with(df,data.frame(x,
                                y = fit,
                                lwr = fit - 1.96 * se,
                                upr = fit + 1.96 * se))

library(ggplot2)
ggplot(df, aes(x = x, y = y))+
  geom_point()+
  geom_line(data = predframe)+
  geom_ribbon(data = predframe, aes(ymin = lwr,ymax = upr), alpha = 0.3)

enter image description here