如何插入线性回归的残差分布,其中x轴垂直于拟合线?

时间:2017-11-30 17:40:11

标签: r data-visualization linear-regression

我已经看到一些论文以智能的方式处理(不完全)回归分析中的残差,他们绘制垂直于拟合线的残差分布。 图2或图5中的示例图像(线性回归):https://www.nature.com/articles/nn.4538#results

我的R例子:

数据示例取自:https://www.r-bloggers.com/simple-linear-regression-2/

数据示例:

alligator = data.frame(
  lnLength = c(3.87, 3.61, 4.33, 3.43, 3.81, 3.83, 3.46, 3.76,
               3.50, 3.58, 4.19, 3.78, 3.71, 3.73, 3.78),
  lnWeight = c(4.87, 3.93, 6.46, 3.33, 4.38, 4.70, 3.50, 4.50,
               3.58, 3.64, 5.90, 4.43, 4.38, 4.42, 4.25)
)

线性回归模型:

reg <- lm(alligator$lnWeight ~ alligator$lnLength)

散点图:

plot(alligator,
   xlab = "Snout vent length (inches) on log scale",
   ylab = "Weight (pounds) on log scale",
   main = "Alligators in Central Florida"
)

安装线:

abline(reg,col = "black", lwd = 1)

剩余分布(直方图):

hist(reg$residuals, 10, xaxt='n', yaxt='n', ann=FALSE)

我想在线性回归图的顶部插入直方图作为图2或图5中的示例图像(线性回归):https://www.nature.com/articles/nn.4538#results

感谢您的帮助。

1 个答案:

答案 0 :(得分:2)

这将使残差直方图覆盖在主图上。你需要做一些工作才能使它垂直成角度,就像你引用的例子一样。

library("ggplot2")
theme_set(theme_minimal())

alligator = data.frame(
  lnLength = c(3.87, 3.61, 4.33, 3.43, 3.81, 3.83, 3.46, 3.76,
               3.50, 3.58, 4.19, 3.78, 3.71, 3.73, 3.78),
  lnWeight = c(4.87, 3.93, 6.46, 3.33, 4.38, 4.70, 3.50, 4.50,
               3.58, 3.64, 5.90, 4.43, 4.38, 4.42, 4.25)
)

reg <- lm(alligator$lnWeight ~ alligator$lnLength)


# make main plot, with best fit line (set se=TRUE to get error ribbon)
main_plot <- ggplot(alligator, aes(x=lnLength, y=lnWeight)) + 
  geom_point() + geom_smooth(method="lm", se=FALSE) + 
  scale_y_continuous(limits=c(0,7))

# create another plot, histogram of the residuals 
added_plot <- ggplot(data.frame(resid=reg$residuals), aes(x=resid)) + 
  geom_histogram(bins=5) + 
  theme(panel.grid=element_blank(), 
        axis.text.y=element_blank(), 
        axis.text.x=element_text(),
        axis.title.x=element_blank(), 
        axis.title.y=element_blank(),
        axis.ticks.y=element_blank(),
        axis.line.y=element_blank())

# turn the residual plot into a ggplotGrob() object 
added_plot_grob <- ggplot2::ggplotGrob(added_plot)

# then add the residual plot to the main one as a custom annotation 
main_plot + annotation_custom(grob=added_plot_grob, 
                              xmin=4.0, xmax=4.35, ymin=1, ymax=5)

scatterplot with fit line and overlaid residual histogram

然后查看ggplot2::gridExtra::的文档以确定轮播。希望这可以帮助!