我正在尝试使用ggplot2将正态曲线拟合到我的直方图中

时间:2018-09-17 02:47:16

标签: r ggplot2

我想使正态曲线适合我的分布,我已经看到了一些示例,但是我总是出错。

以下是我正在使用的一些数据。很抱歉,我出于保密原因不得不更改变量名称。

structure(list(X = c(29L, 22L, 27L, 26L, 25L, 26L, 16L, 30L, 
31L, 32L, 29L, 19L, 18L, 26L, 25L, 22L, 23L, 27L, 21L, 16L, 18L, 
25L, 21L, 23L, 22L, 25L, 29L, 23L, 20L, 25L, 25L, 21L, 30L, 27L, 
25L, 18L, 27L, 25L, 27L, 28L, 26L, 20L, 20L, 20L, 23L, 33L, 27L, 
17L, 21L, 19L, 26L, 26L, 20L, 25L, 30L, 17L, 31L, 26L, 25L, 20L, 
27L, 21L, 21L, 21L, 26L, 30L, 23L, 22L, 28L, 17L, 22L, 16L, 25L, 
19L, 14L, 19L, 29L, 27L, 21L, 31L, 24L, 20L, 14L, 23L, 21L, 26L, 
29L, 24L, 27L, 17L, 21L, 19L, 21L, 22L, 22L, 26L, 26L, 34L, 28L, 
34L, 26L, 23L, 24L, 25L, 21L, 19L, 18L, 19L, 20L, 22L, 21L, 20L, 
22L, 19L, 22L, 27L, 25L, 20L, 23L, 19L, 32L, 25L, 27L, 23L, 30L, 
31L, 31L, 23L, 25L, 21L, 26L, 17L, 24L, 16L, 29L, 20L, 31L, 28L, 
28L, 26L, 26L, 29L, 33L, 23L, 19L, 24L, 23L, 20L, 20L, 28L, 19L, 
26L, 25L, 24L, 19L, 21L, 22L, 21L, 31L, 21L, 16L, 23L, 29L, 25L, 
24L, 19L, 19L, 19L, 23L, 25L, 26L, 19L, 22L, 24L, 29L, 19L, 15L, 
22L, 17L, 23L, 27L, 23L, 16L, 23L, 28L, 21L, 30L, 19L, 24L, 23L, 
24L, 31L, 23L, 28L, 21L, 25L, 29L, 22L, 28L, 20L, 20L, 28L, 29L, 
27L, 27L, 22L, 22L, 29L, 31L, 22L, 24L, 15L, 20L, 34L, 23L, 24L, 
21L, 25L, 24L, 20L, 26L, 24L, 16L, 25L, 27L, 28L, 26L, 24L, 22L, 
21L, 27L, 25L, 24L, 26L, 16L, 29L, 18L, 26L, 23L, 26L, 27L, 16L, 
33L, 23L, 31L, 23L, 21L, 22L, 22L, 20L, 19L, 24L, 25L, 28L, 24L, 
26L, 30L, 26L, 29L, 17L, 29L, 19L, 28L, 25L, 24L, 23L, 25L, 19L, 
25L, 24L, 23L, 20L, 18L, 20L, 21L, 20L, 24L, 32L, 19L, 19L, 22L, 
21L, 22L, 22L, 20L, 25L, 17L, 28L, 25L, 22L, 19L, 24L, 15L, 26L, 
26L, 30L, 29L, 20L, 26L, 25L, 27L, 24L, 26L, 21L, 23L, 22L, 13L, 
21L, 22L, 25L, 23L, 23L, 15L, 20L, 29L, 26L, 23L, 23L, 20L, 23L, 
21L, 30L, 16L, 21L, 19L, 20L, 26L, 30L, 20L, 20L, 23L, 22L, 24L, 
19L, 21L, 24L, 19L, 26L, 32L, 20L, 19L, 24L, 20L, 29L, 21L, 20L, 
26L, 22L, 22L, 23L, 27L, 24L, 24L, 25L, 21L, 30L, 21L, 23L, 27L, 
21L, 27L, 23L, 24L, 22L, 20L, 18L, 30L, 20L, 23L, 21L, 24L, 28L, 
22L, 17L, 21L, 26L, 22L, 24L, 25L, 27L, 24L, 21L, 19L, 24L, 18L, 
29L, 21L, 23L, 19L, 16L, 21L, 24L, 19L, 24L, 26L, 27L, 22L, 17L, 
16L, 25L, 21L, 19L, 27L, 33L, 24L, 26L, 26L, 27L, 23L, 24L, 24L, 
24L, 20L, 23L, 21L, 19L, 23L, 32L, 17L, 16L, 16L, 25L, 23L, 21L, 
22L, 25L, 19L, 23L, 24L, 18L, 26L, 24L, 21L, 20L, 27L, 23L, 22L, 
28L, 20L, 21L, 20L, 22L, 19L, 27L, 22L, 21L, 24L, 18L, 24L, 21L, 
17L, 22L, 24L, 18L, 19L, 21L, 27L, 28L, 23L, 17L, 28L, 20L, 23L, 
22L, 21L, 20L, 30L, 30L, 23L, 24L, 25L, 23L, 24L, 29L, 17L, 22L, 
28L, 14L, 23L, 21L, 23L, 21L, 20L, 25L, 26L, 24L, 23L, 22L, 21L, 
26L, 30L, 19L, 22L, 22L, 19L, 19L, 26L, 24L, 22L, 20L, 22L, 27L, 
19L, 27L, 18L, 20L, 19L, 22L, 30L, 14L, 23L, 27L, 23L, 16L, 20L, 
20L, 20L, 25L, 19L, 21L, 21L, 23L, 18L, 24L, 22L, 26L, 22L, 17L, 
21L, 21L, 22L, 19L, 21L, 27L, 23L, 20L, 28L, 26L, 26L, 24L, 20L, 
30L, 27L, 21L, 25L, 20L, 25L, 25L, 24L, 19L, 25L, 25L, 19L, 22L, 
26L, 16L, 28L, 21L, 23L, 25L, 26L, 14L, 24L, 25L, 19L, 26L, 27L, 
19L, 20L, 23L, 23L, 28L, 19L, 20L, 23L, 27L, 24L, 25L, 23L, 24L, 
25L, 21L, 28L, 20L, 26L, 29L, 24L, 18L, 20L, 22L, 32L, 35L, 25L, 
21L, 24L, 13L, 17L, 21L, 28L, 25L, 19L, 22L, 27L, 28L, 26L, 19L, 
27L, 20L, 22L, 24L, 24L, 31L, 23L, 29L, 28L, 20L, 19L, 28L, 23L, 
21L, 25L, 21L, 22L, 27L, 25L, 21L, 23L, 25L, 26L, 27L, 26L, 25L, 
29L, 33L, 25L, 21L, 19L, 23L, 19L, 19L, 31L, 21L, 23L, 22L, 28L, 
27L, 21L, 22L, 19L, 25L, 26L, 24L, 15L, 21L, 32L, 27L, 27L, 25L, 
23L, 28L, 23L, 21L, 27L, 16L, 17L, 23L, 29L, 22L, 21L, 30L, 26L, 
20L, 21L, 27L, 19L, 29L, 22L, 26L, 19L, 21L, 28L, 29L, 22L, 17L, 
30L, 26L, 25L, 20L, 20L, 24L, 28L, 25L, 19L, 26L, 20L, 25L, 18L, 
17L, 26L, 27L, 28L, 22L, 18L, 23L, 29L, 26L, 27L, 33L, 20L, 23L, 
20L, 16L, 23L, 30L, 25L, 27L, 26L, 26L, 22L, 26L, 20L, 24L, 22L, 
25L, 23L, 28L, 24L, 21L, 22L, 27L, 24L, 27L, 21L, 30L, 33L, 13L, 
26L, 20L, 24L, 20L, 22L, 21L, 21L, 32L, 19L, 31L, 28L, 21L, 26L, 
19L, 23L, 22L, 23L, 22L, 21L, 24L, 16L, 25L, 20L, 27L, 21L, 24L, 
24L, 27L, 22L, 25L, 28L, 27L, 28L, 28L, 18L, 16L, 23L, 22L, 24L, 
23L, 23L, 29L, 23L, 18L, 22L, 24L, 27L, 28L, 23L, 22L, 15L, 27L, 
23L, 24L, 17L, 31L, 24L, 17L, 16L, 28L, 27L, 27L, 23L, 23L, 30L, 
21L, 24L, 16L, 25L, 16L, 23L, 27L, 20L, 23L, 19L, 25L, 18L, 22L, 
24L, 19L, 22L, 27L, 22L, 18L, 13L, 19L, 26L, 23L, 25L, 29L, 17L, 
24L, 30L, 18L, 27L, 16L, 22L, 29L, 16L, 19L, 21L, 21L, 22L, 21L, 
17L, 19L, 20L, 31L, 30L, 25L, 25L, 23L, 21L, 26L, 20L, 22L, 20L, 
21L, 25L, 22L, 21L, 24L, 13L, 24L, 24L, 23L, 24L, 23L, 19L, 27L, 
22L, 37L, 22L, 25L, 23L, 27L, 14L, 26L, 21L, 19L, 21L, 22L, 29L, 
26L, 23L, 21L, 20L, 14L, 23L, 26L, 21L, 26L, 17L, 21L, 19L, 23L, 
14L, 25L, 18L, 22L, 28L, 29L, 21L, 27L, 25L, 28L, 24L, 24L, 24L, 
30L, 22L, 24L, 21L, 24L, 16L, 25L, 18L, 20L, 19L, 25L, 17L, 20L, 
21L, 18L, 19L, 26L, 23L, 24L, 20L, 21L, 31L, 27L, 23L, 22L, 16L, 
21L, 23L, 20L, 23L, 29L, 25L, 23L, 24L, 30L, 26L, 27L, 22L, 14L, 
12L, 19L, 23L, 22L, 16L, 15L, 23L, 19L, 24L, 25L, 15L, 21L, 30L, 
13L, 27L, 21L, 17L, 25L, 29L, 22L, 22L, 21L, 31L, 22L, 29L, 30L, 
20L, 21L, 21L, 22L, 26L, 23L, 18L, 15L, 17L, 27L, 20L, 26L, 25L, 
25L, 25L, 27L, 20L, 25L, 27L, 24L, 21L, 25L, 25L, 18L, 31L, 23L, 
26L, 22L, 29L, 20L), row.names = c(NA, 
-1000L), class = c("tbl_df", "tbl", "data.frame"), spec = structure(list(
    cols = list(X = structure(list(), class = c("collector_integer", 
    "collector")), Y = structure(list(), class = c("collector_integer", 
    "collector")), Z = structure(list(), class = c("collector_integer", 
    "collector"))), default = structure(list(), class = c("collector_guess", 
    "collector"))), class = "col_spec"))

这是我的第一篇文章,想弄清楚我要裸露的骨头

library(ggplot2)

ggplot(data = chartA, mapping = aes(x = X)) +
  geom_histogram(bins = 20, color = "white", fill = "steelblue") +
    xlab("Values of X") +
    ylab("Frequency of X Values") +
    ggtitle("Histogram of X with Normal Curve")

我应该在哪里精确地获得代码的正常曲线?

1 个答案:

答案 0 :(得分:2)

Tung的答案可能是您想要的,但实际上并没有创建正态曲线-只是使直方图平滑,但不假定结果将为正态分布。您可以使用stat_function()绘制正态分布的密度以及观察到的平均值和标准偏差:

# Adapting Tung's answer, adding the normal distribution density in purple
ggplot(data = chartA, mapping = aes(x = X)) +
    geom_histogram(aes(y = ..density..),
                   alpha = 0.8, bins = 20,
                   color = "white", fill = "steelblue",
                   position = "identity"
    ) +
    geom_density(alpha = .2) +
    stat_function(fun = function(x) {
        dnorm(x, mean = mean(chartA$X), sd = sd(chartA$X))
    }, colour = "purple") +
    scale_x_continuous(expand = c(0, 0)) +
    scale_y_continuous(expand = c(0, 0)) +
    xlab("Values of X") +
    ylab("Density") +
    ggtitle("Histogram of X with Normal Curve") +
    theme_classic(base_size = 14)