我正在尝试制作一个图表,显示不同年龄组中有18岁以下孩子的男性和女性的百分比。我想要一个有两个酒吧的图表(一个用于男性,一个用于女性)每个年龄组并排;我希望两个栏显示底部有孩子的百分比,而不是顶部(堆积的栏)。我无法弄清楚如何在ggplot2中制作这样的图表,并且非常感谢建议。
我使用dplyr计算了我的分组统计数据:
kid18summary <- marsub %>%
group_by(AgeGroup, sex, kid_under_18) %>%
summarise(n=n()) %>%
mutate(freq = n/sum(n))
产生了这个:
dput(kid18summary)
structure(list(AgeGroup = c("Age<40", "Age<40", "Age<40", "Age<40",
"Age41-49", "Age41-49", "Age41-49", "Age41-49", "Age50-64", "Age50-64",
"Age50-64", "Age50-64"), sex = structure(c(1L, 1L, 2L, 2L, 1L,
1L, 2L, 2L, 1L, 1L, 2L, 2L), .Label = c("Male", "Female"), class = "factor"),
kid_under_18 = c("No", "Yes", "No", "Yes", "No", "Yes", "No",
"Yes", "No", "Yes", "No", "Yes"), freq = c(0.625, 0.375,
0.636833046471601, 0.363166953528399, 0.349557522123894,
0.650442477876106, 0.444897959183673, 0.555102040816327,
0.724852071005917, 0.275147928994083, 0.819548872180451,
0.180451127819549)), .Names = c("AgeGroup", "sex", "kid_under_18",
"freq"), class = c("grouped_df", "tbl_df", "tbl", "data.frame"
), row.names = c(NA, -12L), vars = list(AgeGroup, sex), drop = TRUE, indices = list(
0:1, 2:3, 4:5, 6:7, 8:9, 10:11), group_sizes = c(2L, 2L,
2L, 2L, 2L, 2L), biggest_group_size = 2L, labels = structure(list(
AgeGroup = c("Age<40", "Age<40", "Age41-49", "Age41-49",
"Age50-64", "Age50-64"), sex = structure(c(1L, 2L, 1L, 2L,
1L, 2L), .Label = c("Male", "Female"), class = "factor")), class = "data.frame", row.names = c(NA,
-6L), vars = list(AgeGroup, sex), drop = TRUE, .Names = c("AgeGroup",
"sex")))
我可以绘制每个年龄组和没有18岁以下孩子的性别比例:
ggplot(kid18summary, aes(x = factor(AgeGroup), y = freq, fill = factor(sex)), color = factor(sex)) +
geom_bar(position = "dodge", stat = "identity") + scale_y_continuous(labels = percent)
或者我可以制作一个更接近我想要的刻面叠加条形图,因为我想同时显示“是”和“否”,即使百分比加起来也是如此100因为我认为比较负面空间更容易比较彩色条。唯一的麻烦是无论我做什么,底部都是“No”,顶部是“Yes”,我反过来也喜欢它。 (理想情况下,我真的希望男女不同的颜色,对于有孩子的男人来说是深蓝色,对于没有男人的人来说是淡蓝色;对于有孩子的女人来说是暗红色,对于没有女人的女人是浅色的,但我已经放弃了那暂时。)
我试图以各种方式改变因素的顺序,都完全不成功。
正如ggplot2 documentation中所述,我尝试直接更改因子级别的顺序:
kid18summary$kid_under_18 < as.factor(kid18summary$kid_under_18)
o <- c("Yes", "No") # which I've also changed to ("No", "Yes"), which makes no difference; the order of the Yes and No in the legend changes, but the "Yes" bars stay on top
kid18summary$kid_under_18 <- factor(kid18summary$kid_under_18, levels = o)
kid18summary $ kid_under_18&lt; - factor(kid18summary $ kid_under_18,levels(kid18summary $ kid_under_18)[c(“是”,“否”)])#更改为[c(“否”,“是”)]也仅更改图例的顺序
我已尝试在另一个问题中提出的答案,并添加了另一个有序因素:
kid18summary <- transform(kid18summary, stack.ord = factor(kid_under_18, levels = c("Yes", "No"), ordered = TRUE))
ggplot(kid18summary, aes(x = factor(sex), y = freq, fill = factor(stack.ord)), color = factor(stack.ord)) + geom_bar(stat = "identity") + scale_y_continuous(labels = percent) + facet_wrap(~AgeGroup, nrow=1)
或者只是添加另一个虚拟变量:
kid18summary$orderfactor <- "NA"
kid18summary$orderfactor[kid18summary$kid_under_18 == "Yes"] <- 0
kid18summary$orderfactor[kid18summary$kid_under_18 == "No"] <- 1
ggplot(kid18summary, aes(x = factor(sex), y = freq, fill = factor(orderfactor)), color = factor(orderfactor)) + geom_bar(stat = "identity") + scale_y_continuous(labels = percent) + facet_wrap(~AgeGroup, nrow=1)
答案 0 :(得分:1)
根据aosmith建议的答案,我最终得到了以下内容,这正是我想要的:
ggplot(arrange(df, kid_under_18), aes(x = factor(sex), y = freq, fill = interaction(sex, factor(kid_under_18))), color = factor(kid_under_18)) +
geom_bar(stat = "identity") + scale_y_continuous(labels = percent) +
facet_wrap(~AgeGroup, nrow=1)