我有一些图表可以绘制某些收入数据的对数正态分布。我想将理论正态分布作为图层/背景进行叠加,只是为了突出显示使我的对数正态尝试使正态分布与实际正态分布之间的任何偏斜或差异。示例代码:
library(ggplot2)
library(dplyr)
f <- function(x) {
y <- diamonds$price[diamonds$cut == x]
paste(seq(-3, 3), scales::dollar(round(mean(y) + seq(-3, 3) * sd(y))), sep = "\n")
}
breaks <- as.vector(sapply(levels(diamonds$cut), f))
diamonds %>%
group_by(cut) %>%
mutate(z = scale(price) + 3 + 1000 * as.numeric(cut)) %>%
ggplot(aes(z)) +
geom_point(aes(x = z - 2, y = 1), alpha = 0) +
geom_density() +
scale_x_continuous(breaks = as.vector(sapply(1:5 * 1000, "+", 0:6)),
labels = breaks) +
facet_wrap(vars(cut), scales = "free_x") +
theme(text = element_text(size = 16),
axis.text.x = element_text(size = 6))
在这种情况下,diamonds $ price在视觉上不正常。有没有办法让我在每个图表上覆盖理论正态分布?
答案 0 :(得分:2)
您可以制作第二个数据帧,该数据帧是每个切口的(偏移的)正常密度,然后将其添加到geom_line
上。 crossing
函数来自tidyr
包,并在两个组件数据帧之间创建交叉联接:
library(ggplot2)
library(dplyr)
library(tidyr)
f <- function(x) {
y <- diamonds$price[diamonds$cut == x]
paste(seq(-3, 3), scales::dollar(round(mean(y) + seq(-3, 3) * sd(y))), sep = "\n")
}
breaks <- as.vector(sapply(levels(diamonds$cut), f))
x <- seq(-3, 3, length.out = 1000)
shifted_densities <- data.frame(
cut = levels(diamonds$cut),
mean = seq(1000, 5000, length.out = 5) + 3) %>% # group means based on your breaks
crossing(
data.frame(x = x,
p = dnorm(x))) %>%
mutate(x = x + mean) # shift everything over to the right center
diamonds %>%
group_by(cut) %>%
mutate(z = scale(price) + 3 + 1000 * as.numeric(cut)) %>%
ggplot(aes(z)) +
geom_point(aes(x = z - 2, y = 1), alpha = 0) +
geom_density() +
scale_x_continuous(breaks = as.vector(sapply(1:5 * 1000, "+", 0:6)),
labels = breaks) +
facet_wrap(vars(cut), scales = "free_x") +
theme(text = element_text(size = 16),
axis.text.x = element_text(size = 6)) +
geom_line(aes(x, p), data = shifted_densities, col = "red")