我正在尝试生成嘈杂且庞大的基础数据集的风格化版本和回归线。为此,我制作了bin或宽度可变的容器,尝试使每个bin中的观察次数相等,如下所示:
library(mltools)
complete$mtgHours_evenBins <- bin_data(complete$mtgHoursPerUser_mean, bins=500, binType = "quantile")
然后,我得到垃圾箱的中点,并得到新的汇总平均值,如下所示:
complete$mtgHours_evenBins_midpoints <- midpoints(complete$mtgHours_evenBins)
#generate new aggregated means after grouping by new bins
complete <- complete %>%
dplyr::group_by(mtgHours_evenBins) %>%
dplyr::mutate(even_binned_rev_2016_log_mean = mean(rev_2016_log))
我可以这样绘制图形:
ggplot(data = complete, aes(x = mtgHours_evenBins_midpoints, y=even_binned_rev_2016_log_mean))+
geom_point(color='blue') +
stat_smooth(data=complete, aes(x = mtgHours_evenBins_midpoints, y = binned_rev_2016_log_mean),
method = "lm", formula = y ~ x + I(x^2), size = 1, color = "red", se = TRUE)
但是,无论我使用stat_smooth还是geom_smooth,都不会生成置信区间。这是由于每个观察(箱)都包含相同数量的观察结果吗?有什么我想念的吗?