使用经典菱形数据集,adjust
可用于平滑绘图。为什么这不适用于我的数据集? adjust
被忽略。
library(tidyverse)
df <- structure(list(year = c(1971, 1971, 1971, 1971, 1971, 1971, 1971,
1971, 1971, 1971, 1971, 1971, 1971, 1971, 1971, 1971, 1971, 1971
), age_group = structure(2:19, .Label = c("All ages", "0 to 4 years",
"5 to 9 years", "10 to 14 years", "15 to 19 years", "20 to 24 years",
"25 to 29 years", "30 to 34 years", "35 to 39 years", "40 to 44 years",
"45 to 49 years", "50 to 54 years", "55 to 59 years", "60 to 64 years",
"65 to 69 years", "70 to 74 years", "75 to 79 years", "80 to 84 years",
"85 to 89 years", "90 to 94 years", "95 to 99 years", "100 years and over",
"Median age"), class = "factor"), population = c(1836149, 2267794,
2329323, 2164092, 1976914, 1643264, 1342744, 1286302, 1284154,
1252545, 1065664, 964984, 785693, 626521, 462065, 328583, 206174,
101117), age_min = c(0, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50,
55, 60, 65, 70, 75, 80, 85), age_max = c(4, 9, 14, 19, 24, 29,
34, 39, 44, 49, 54, 59, 64, 69, 74, 79, 84, 89)), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -18L))
ggplot(diamonds, aes(carat)) +
geom_density(adjust = 1)
ggplot(diamonds, aes(carat)) +
geom_density(adjust = 5)
ggplot(pop_dn_filtered, aes(x = age_min, y = population)) +
geom_density(stat = "identity", adjust = 5)
# Warning: Ignoring unknown parameters: adjust
答案 0 :(得分:1)
geom_density()
最常用的用法是平滑直方图,显示矢量在其值上的密度,例如以显示5
比30
出现频率高多少。该函数基于adjust
或bw
,将您的x
值装箱到您指定的粒度,并将密度输出到y
。这里使用stat = "identity"
的特定用途会覆盖该计数过程,并使用您指定的y
即人口。
您的数据目前为汇总格式,其中每一行代表一个年龄组中所有人员的人数,而不是每个人都是一个人。我们可以使用tidyr::uncount
撤消该操作,该操作将每一行复制您为该行指定的次数。因此,如果一排的人口为500万,我们可以将其复制500万次。但这几乎可以肯定是太过分了;我们可以改为复制一个较小的数字,即按比例复制500万:
df %>% uncount(population/1E4)
# Produces one row for every 10,000 population; in this case 2,183 rows, corresponding to the
# 22M total population in the data
因此,我们可以使用此构造来控制geom_density
中的平滑装箱:
ggplot(df %>% uncount(population/1E4), aes(x = age_min)) +
geom_density(adjust = 0.7)
ggplot(df %>% uncount(population/1E4), aes(x = age_min)) +
geom_density(adjust = 1.5)