我正在尝试使用ggplot2向bin-散点图添加一些自定义功能。我进行bin分散的原始方法是使用stat_summary_bin(fun.y="mean")
。这似乎可以产生合理的分档,但是当我尝试通过手动分档来重现它时,我会得到略有不同的结果-尤其是在右尾。
有人可以帮助我弄清楚stat_summary_bin
中的合并是如何完成的吗?我需要弄清楚这是否是我可以使用的可靠的bin散射形式...
library(tidyverse)
library(mltools)
#>
#> Attaching package: 'mltools'
#> The following object is masked from 'package:tidyr':
#>
#> replace_na
x = runif(1000, 0, 10)
y = x + rnorm(1000, 0.5, 2)
plot(x,y)
df <- data.frame(x = x, y = y)
p <- df %>%
ggplot(aes(x = x, y = y)) +
stat_summary_bin(aes(color ="stat summary"),fun.y = "mean", size = 2.5, geom="point", bins=20)
p
## Attempt 1 at binning
df$x_bin <- mltools::bin_data(df$x, bins=20, binType = "explicit")
df_binned <- df %>%
group_by(x_bin) %>%
mutate(
x_binned = mean(x),
y_binned = mean(y)
) %>%
ungroup()
p <- p + geom_point(aes(x = df_binned$x_binned, y = df_binned$y_binned, color = "manual bin"), size = 2.5)
p
## Attempt 2 at binning
xbreaks = quantile(df$x, probs = seq(0,1,0.05))
df_binned$x_bin_2 <- cut(df$x, xbreaks, include.lowest = T)
df_binned <- df_binned %>%
group_by(x_bin_2) %>%
mutate(
x_binned2 = mean(x),
y_binned2 = mean(y)
) %>%
ungroup()
p <- p + geom_point(aes(x = df_binned$x_binned2, y = df_binned$y_binned2, color = "2nd manual bin"), size = 2.5)
p
由reprex package(v0.2.0)于2018-09-09创建。