我有三种制作情节的方法,每种方式都离我想要的只有一步之遥。我正在使用training data set from Kaggle's Titanic competition,并希望在Pclass(社会经济类)上绘制一个图,其中每个柱是百分比生活/死亡(变量=幸存者(二元) ))在该方面。我也想要二进制变量着色。以下是我的三个情节:
g <- ggplot(training, aes(Survived, y = ..prop.., group = Survived))
g <- g + geom_bar(aes(fill = Survived), position = "dodge", stat = "count")
g <- g + facet_grid(~Pclass)
g <- g + scale_y_continuous(labels = scales::percent)
g <- g + labs(x = "1 = Upper Class | 2 = Middle Class | 3 = Lower Class", y = "Count", title = "The Probability of Living Given Socio-Economic Status")
g
q <- qplot(x = Survived, y = ..prop.., data = training, geom = "bar",
fill = Survived, facets = ~Pclass, stat = "count") +
scale_y_continuous(labels = scales::percent) +
labs(x = "1 = Upper Class | 2 = Middle Class | 3 = Lower Class", y = "Count", title = "The Probability of Living Given Socio-Economic Status")
q
f <- ggplot(training, aes(Survived, group = Survived))
f <- f + geom_histogram(aes(fill = Survived), position = "fill", stat = "count")
f <- f + facet_grid(~Pclass)
f <- f + scale_y_continuous(labels = scales::percent)
f <- f + labs(x = "1 = Upper Class | 2 = Middle Class | 3 = Lower Class", y = "Count", title = "The Probability of Living Given Socio-Economic Status")
f
它们看起来完全一样,唯一的问题是每个图中的生存/死亡条都等于100%。任何想法如何让每个方面的百分比正确
答案 0 :(得分:1)
我认为这就是你的目标。要使用构面获得组百分比,请使用geom_bar
,..prop..
,并将构面变量指定为group
:
f <- ggplot(training,
aes(y=Survived,
x=factor(Survived, labels=c("Died","Lived"))))
f <- f + geom_bar(aes(y=..prop.., group=Pclass,
fill=factor(..x.., labels=c("Died","Lived"))))
f <- f + facet_grid(~factor(Pclass,
labels=c("Upper Class", "Middle Class", "Lower Class")))
f <- f + scale_y_continuous(labels = scales::percent)
f <- f + scale_fill_discrete(name="Survival Status")
f <- f + labs(x="", y = "Percentage", title = "The Probability of Living Given Socio-Economic Status")
f
但是fill
参数还有一些问题。上述工作,但我不知道为什么它不会接受Survived
,以及为什么你必须像我做的那样重新计算x。
作为旁注,当你有两个百分比加起来为百分之百的条形图时,将它们并排展示可能不是最佳选择。您可能希望将它们堆叠起来以更清楚地显示比例。
答案 1 :(得分:0)
我不确定你的“y = ..prop ..”这个论点。下面的代码提前计算生存率和死亡率,他们的情节很好。
library(tidyverse)
training %>%
group_by(Pclass) %>%
summarise(
survival_rate = mean(Survived),
death_rate = 1 - survival_rate
) %>%
gather(survival_rate, death_rate, key = rate_type, value = rate) %>%
ggplot(., aes(x = rate_type, y = rate, fill = rate_type)) +
geom_col(position = "dodge") +
facet_grid(~Pclass, labeller = as_labeller(c(
"1" = "First Class", "2" = "Second Class", "3" = "Third Class"))
) +
scale_y_continuous(labels = scales::percent) +
labs(x = NULL,
y = "Survival Rate",
title = "The Probability of Living Given Socio-Economic Status")