在我的真实数据上覆盖理论正态分布

时间:2020-08-06 00:20:18

标签: r ggplot2

我有一些图表可以绘制某些收入数据的对数正态分布。我想将理论正态分布作为图层/背景进行叠加,只是为了突出显示使我的对数正态尝试使正态分布与实际正态分布之间的任何偏斜或差异。示例代码:

library(ggplot2)
library(dplyr)

f <- function(x) {
   y <- diamonds$price[diamonds$cut == x]
   paste(seq(-3, 3), scales::dollar(round(mean(y) + seq(-3, 3) * sd(y))), sep = "\n")
}

breaks <- as.vector(sapply(levels(diamonds$cut), f))

diamonds %>% 
    group_by(cut) %>% 
    mutate(z = scale(price) + 3 + 1000 * as.numeric(cut)) %>%
    ggplot(aes(z)) +
    geom_point(aes(x = z - 2, y = 1), alpha = 0) +
    geom_density() +
    scale_x_continuous(breaks =  as.vector(sapply(1:5 * 1000, "+", 0:6)), 
                       labels = breaks) +
    facet_wrap(vars(cut), scales = "free_x") +
  theme(text = element_text(size = 16),
        axis.text.x = element_text(size = 6))

看起来像: enter image description here

在这种情况下,diamonds $ price在视觉上不正常。有没有办法让我在每个图表上覆盖理论正态分布?

1 个答案:

答案 0 :(得分:2)

您可以制作第二个数据帧,该数据帧是每个切口的(偏移的)正常密度,然后将其添加到geom_line上。 crossing函数来自tidyr包,并在两个组件数据帧之间创建交叉联接:

library(ggplot2)
library(dplyr)
library(tidyr)

f <- function(x) {
  y <- diamonds$price[diamonds$cut == x]
  paste(seq(-3, 3), scales::dollar(round(mean(y) + seq(-3, 3) * sd(y))), sep = "\n")
}

breaks <- as.vector(sapply(levels(diamonds$cut), f))

x <- seq(-3, 3, length.out = 1000)

shifted_densities <- data.frame(
  cut = levels(diamonds$cut),
  mean = seq(1000, 5000, length.out = 5) + 3) %>% # group means based on your breaks
  crossing(
    data.frame(x = x, 
               p = dnorm(x))) %>%
  mutate(x = x + mean) # shift everything over to the right center

diamonds %>% 
  group_by(cut) %>% 
  mutate(z = scale(price) + 3 + 1000 * as.numeric(cut)) %>%
  ggplot(aes(z)) +
  geom_point(aes(x = z - 2, y = 1), alpha = 0) +
  geom_density() +
  scale_x_continuous(breaks =  as.vector(sapply(1:5 * 1000, "+", 0:6)), 
                     labels = breaks) +
  facet_wrap(vars(cut), scales = "free_x") +
  theme(text = element_text(size = 16),
        axis.text.x = element_text(size = 6)) +
  geom_line(aes(x, p), data = shifted_densities, col = "red")

enter image description here