我试图在y轴上制作密度直方图,并将密度曲线叠加在顶部。如果我不按另一个因素对数据进行分组,这样可以正常工作。但是,当我添加填充时,输出不是我预期的。
noFill = ggplot(movies, aes(x=rating, weight=votes/sum(votes)))+
geom_histogram(aes(y=..density..), binwidth = 1) +
geom_density(alpha = .1, fill="blue")
noFill
添加填充:
filled = ggplot(movies, aes(x=rating, fill = mpaa,
weight=votes/sum(votes)))+
geom_histogram(aes(y=..density..),
position = "identity", alpha = .6,
binwidth = 1) +
geom_density(alpha = .3)
filled
这些是我得到的情节:
我还从“填充”中收到以下警告。代码:
Warning messages:
1: In density.default(data$x, adjust = adjust, kernel = kernel, weight = data$weight, :
sum(weights) != 1 -- will not get true density
2: In density.default(data$x, adjust = adjust, kernel = kernel, weight = data$weight, :
sum(weights) != 1 -- will not get true density
3: In density.default(data$x, adjust = adjust, kernel = kernel, weight = data$weight, :
sum(weights) != 1 -- will not get true density
4: In density.default(data$x, adjust = adjust, kernel = kernel, weight = data$weight, :
sum(weights) != 1 -- will not get true density
5: In density.default(data$x, adjust = adjust, kernel = kernel, weight = data$weight, :
sum(weights) != 1 -- will not get true density
根据警告和图表的显示方式,我怀疑整个数据集的权重应用于每个填充组而不是计算每组的权重存在问题。我确信在许多情况下目前的行为是可取的,但我希望密度曲线遵循直方图。我是否可以在不必手动计算统计数据的情况下引发此行为?
编辑: 这是一个黑客,可以进一步说明我想要的东西。在此代码中,我只是将每个组密度曲线添加为单独的图层,并传入仅包含该组数据的df。这显然迫使密度计算按组而不是聚合在所有数据上。
filled = ggplot(movies, aes(x=rating, fill = mpaa,
weight=votes/sum(votes)))+
geom_histogram(aes(y=..density..),
position = "identity", alpha = .6,
binwidth = 1)
for (grp in rev(levels(movies$mpaa))) {
df = movies %>% filter(mpaa == grp)
layer = geom_density(data = df, alpha = .3)
filled = filled + layer
}
filled
" rev"在for循环中,NC-17最后被绘制,因此您可以更清楚地看到。
这里是情节(我再也没有发布图片的权限):
另外,如果它是相关的,最后我将完全取出直方图。我只想要每组密度。因此,如果有一种方法可以在没有直方图的情况下做到这一点,我也会很高兴。