Question

我意识到有几个帖子让人们询问如何并排绘制两个直方图（如同一个图中彼此相邻的条形图）并叠加在R中以及如何规范化数据。根据我发现的建议，我可以做一个或另一个，但不能同时做这两个操作。

这是设置。我有两个不同长度的数据帧，并希望将每个df中的对象的体积绘制为直方图。例如，数据帧1中的数量在.1-.2 um ^ 3之间，并将其与数据帧2中的数量在.1和.2 um ^ 3之间进行比较，依此类推。覆盖或并排是很好的。

由于一个数据帧中的测量值比另一个更多，显然我必须标准化，所以我使用：

read.csv(ctl)
read.csv(exp)
h1=hist(ctl$Volume....)
h2=hist(exp$Volume....

#to normalize#

h1$density=h1$counts/sum(h1$counts)*100
plot(h1,freq=FALSE....)
h2$density=h2$counts/sum(h2$counts)*100
plot(h2,freq=FALSE....)

现在，我已成功使用此方法覆盖未规范化的数据：http://www.r-bloggers.com/overlapping-histogram-in-r/以及此方法：plotting two histograms together

但在涉及如何覆盖规范化数据方面我感到困惑

Answer 1

ggplot2使得绘制具有不等大小的组的标准化直方图变得相对简单。这是假数据的一个例子：

library(ggplot2)

# Fake data (two normal distributions)
set.seed(20)
dat1 = data.frame(x=rnorm(1000, 100, 10), group="A")
dat2 = data.frame(x=rnorm(2000, 120, 20), group="B")
dat = rbind(dat1, dat2)

ggplot(dat, aes(x, fill=group, colour=group)) +
  geom_histogram(breaks=seq(0,200,5), alpha=0.6, 
                 position="identity", lwd=0.2) +
  ggtitle("Unormalized")

ggplot(dat, aes(x, fill=group, colour=group)) +
  geom_histogram(aes(y=..density..), breaks=seq(0,200,5), alpha=0.6, 
                 position="identity", lwd=0.2) +
  ggtitle("Normalized")

enter image description here

如果你想制作叠加的密度图，你也可以这样做。 adjust控制带宽。默认情况下已经将其标准化。

ggplot(dat, aes(x, fill=group, colour=group)) +
  geom_density(alpha=0.4, lwd=0.8, adjust=0.5)

enter image description here

更新：在回答您的评论时，以下代码应该这样做。 (..density..)/sum(..density..)导致两个直方图上的总密度加起来为1，每个单独组的总密度加起来为0.5。因此，您需要乘以2，以便将每个组的总密度单独标准化为1.通常，您必须乘以n，其中n是组的数量。这看起来很笨拙，可能会有更优雅的方法。

library(scales) # For percent_format()

ggplot(dat, aes(x, fill=group, colour=group)) +
  geom_histogram(aes(y=2*(..density..)/sum(..density..)), breaks=seq(0,200,5), alpha=0.6, 
                 position="identity", lwd=0.2) +
  scale_y_continuous(labels=percent_format())

enter image description here

R Normalize然后在R中将两个直方图一起绘制

1 个答案: