Question

我有2级因子的数据，我想将ggplot2作为重叠直方图。

我的数据：

set.seed(1)
df <- data.frame(y = c(rnorm(1000),rnorm(10)), group = c(rep("A",1000),rep("B",10)))

我的情节：

library(ggplot2)
ggplot(df, aes(y, fill = group)) + geom_histogram(alpha = 0.5, position = "identity")

问题在于，由于A组和B组的点数非常不同，因此使用相同binwidth的代码将它们绘制在一起并不理想。

事实上，它会发出警告：

stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

有没有办法用不同的宽度绘制重叠的直方图？

Answer 1

您还可以将这些因素分开并应用不同的binwidth s：

library(dplyr)
library(ggplot2)

set.seed(1)
df <- data.frame(y = c(rnorm(1000), rnorm(10)), 
                 group = c(rep("A", 1000), rep("B", 10)))

gg <- ggplot()
gg <- gg + geom_histogram(data=filter(df, group=="A"), 
                          aes(y, fill=group), 
                          alpha=0.5)
gg <- gg + geom_histogram(data=filter(df, group=="B"), 
                          aes(y, fill=group), 
                          binwidth=4, alpha=0.5)
gg

Answer 2

您需要使用密度，即将直方图下的区域总和为1.在基本图形中，您可以在freq=FALSE函数中设置hist。对于ggplot2，你可以这样做：

ggplot(df, aes(y, fill = group)) + geom_histogram(aes(y=..density..))

或

ggplot(df, aes(y, fill = group)) + geom_density()

用不同的binwidth覆盖ggplot2直方图

2 个答案: