Question

test_data <-  data.frame(x = runif(20, 0, 10), y = runif(20, 0, 10))

ggplot(test_data, aes(x)) + geom_histogram(binwidth = 1)

test_data <- test_data %>% arrange(x) 
test_list <- list()
for(i in 1:10){
    test_list[[i]] <- test_data %>% filter( x < i & x > i-1)
}
test_list 

test_means <- c()
for(i in 1:10){test_means[i] <- mean(test_list[[i]]$y)}
test_means

嘿Y＆＃39;所有，

我试图了解有关直方图和ggplot2的更多信息。我想要做的是使用变量x绘制直方图，然后我想获得每个bin中表示的子组的变量y的平均值，最后我想将这个均值放在bin上面在直方图中。

这个问题有两个方面：

a）是否有ggplot2函数（或任何其他函数）可以得到每个bin-subgroup的y均值。现在我只能考虑使for()函数从x的{{1}}变量的最小值到最大值进行迭代。它不是很干净或简洁......

b）binwidth是否提供了一种在相应的bin之上设置变量的方法，例如每个bin的新标识ggplot2的平均值？

感谢您的时间。

Answer 1

stat_bin()函数（geom_histogram()调用）没有内置任何内容来执行您所要求的内容，但它不是太难（或者不是[干净|简洁]）你问的是什么：

library(ggplot2)
library(dplyr)

set.seed(15) # reproducible

test_data <-  data.frame(x = runif(20, 0, 10), 
                         y = runif(20, 0, 10))

gg <- ggplot(test_data, aes(x)) + 
  geom_histogram(binwidth=1, fill="#2166ac", color="white")

mean_bin <- function(df) {
  filter(test_data, x > df$xmin & x <= df$xmax) %>% 
    summarise(μ=mean(y), ct=df$count[1]) %>% 
    mutate(μ=ifelse(is.nan(μ), NA, μ))
}

group_by(ggplot_build(gg)$data[[1]], x) %>% 
  do(mean_bin(.)) %>%
  ungroup() -> bin_means


gg <- gg + geom_text(data=bin_means, 
                     aes(x, ct, label=sprintf("μ(y)=%3.2f", μ)), 
                     vjust=0, nudge_y=0.1, size=2.5)
gg <- gg + scale_x_continuous(breaks=1:10)
gg <- gg + scale_y_continuous(expand=c(0,0), limits=c(0, 4.5))
gg <- gg + theme_bw()
gg <- gg + theme(panel.grid.major.x=element_blank())
gg <- gg + theme(panel.grid.minor=element_blank())
gg <- gg + theme(panel.border=element_blank())
gg <- gg + theme(axis.ticks=element_blank())
gg

您必须执行<= df$xmax，因为默认情况下geom_histogram()/stat_bin()会右键关闭这些垃圾箱。

Answer 2

你可以尝试基础R：

oredrby

ggplot：计算变量x的直方图，显示bin上方变量y的平均值

2 个答案: