使用“调整”的平滑密度图

时间:2019-10-04 17:01:38

标签: r ggplot2

使用经典菱形数据集,adjust可用于平滑绘图。为什么这不适用于我的数据集? adjust被忽略。

library(tidyverse)

df <- structure(list(year = c(1971, 1971, 1971, 1971, 1971, 1971, 1971, 
                              1971, 1971, 1971, 1971, 1971, 1971, 1971, 1971, 1971, 1971, 1971
), age_group = structure(2:19, .Label = c("All ages", "0 to 4 years", 
                                          "5 to 9 years", "10 to 14 years", "15 to 19 years", "20 to 24 years", 
                                          "25 to 29 years", "30 to 34 years", "35 to 39 years", "40 to 44 years", 
                                          "45 to 49 years", "50 to 54 years", "55 to 59 years", "60 to 64 years", 
                                          "65 to 69 years", "70 to 74 years", "75 to 79 years", "80 to 84 years", 
                                          "85 to 89 years", "90 to 94 years", "95 to 99 years", "100 years and over", 
                                          "Median age"), class = "factor"), population = c(1836149, 2267794, 
                                                                                           2329323, 2164092, 1976914, 1643264, 1342744, 1286302, 1284154, 
                                                                                           1252545, 1065664, 964984, 785693, 626521, 462065, 328583, 206174, 
                                                                                           101117), age_min = c(0, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 
                                                                                                                55, 60, 65, 70, 75, 80, 85), age_max = c(4, 9, 14, 19, 24, 29, 
                                                                                                                                                         34, 39, 44, 49, 54, 59, 64, 69, 74, 79, 84, 89)), class = c("tbl_df", 
                                                                                                                                                                                                                     "tbl", "data.frame"), row.names = c(NA, -18L))

ggplot(diamonds, aes(carat)) +
  geom_density(adjust = 1)

ggplot(diamonds, aes(carat)) +
  geom_density(adjust = 5)

ggplot(pop_dn_filtered, aes(x = age_min, y = population)) + 
  geom_density(stat = "identity", adjust = 5)
# Warning: Ignoring unknown parameters: adjust

1 个答案:

答案 0 :(得分:1)

geom_density()最常用的用法是平滑直方图,显示矢量在其值上的密度,例如以显示530出现频率高多少。该函数基于adjustbw,将您的x值装箱到您指定的粒度,并将密度输出到y。这里使用stat = "identity"的特定用途会覆盖该计数过程,并使用您指定的y即人口。

您的数据目前为汇总格式,其中每一行代表一个年龄组中所有人员的人数,而不是每个人都是一个人。我们可以使用tidyr::uncount撤消该操作,该操作将每一行复制您为该行指定的次数。因此,如果一排的人口为500万,我们可以将其复制500万次。但这几乎可以肯定是太过分了;我们可以改为复制一个较小的数字,即按比例复制500万:

df %>% uncount(population/1E4)
# Produces one row for every 10,000 population; in this case 2,183 rows, corresponding to the 
# 22M total population in the data

因此,我们可以使用此构造来控制geom_density中的平滑装箱:

ggplot(df %>% uncount(population/1E4), aes(x = age_min)) + 
  geom_density(adjust = 0.7)

enter image description here

ggplot(df %>% uncount(population/1E4), aes(x = age_min)) + 
  geom_density(adjust = 1.5)

enter image description here