R - qplot vs. geom_bar vs. geom_histogram

时间:2017-10-06 13:38:37

标签: r ggplot2

我有三种制作情节的方法,每种方式都离我想要的只有一步之遥。我正在使用training data set from Kaggle's Titanic competition,并希望在Pclass(社会经济类)上绘制一个图,其中每个柱是百分比生活/死亡(变量=幸存者(二元) ))在该方面。我也想要二进制变量着色。以下是我的三个情节:

g <- ggplot(training, aes(Survived, y = ..prop.., group = Survived))
g <- g + geom_bar(aes(fill = Survived), position = "dodge", stat = "count")
g <- g + facet_grid(~Pclass)
g <- g + scale_y_continuous(labels = scales::percent)
g <- g + labs(x = "1 = Upper Class        |        2 = Middle Class        |        3 = Lower Class", y = "Count", title = "The Probability of Living Given Socio-Economic Status")
g

q <- qplot(x = Survived, y = ..prop.., data = training, geom = "bar",
      fill = Survived, facets = ~Pclass, stat = "count") +
      scale_y_continuous(labels = scales::percent) +
      labs(x = "1 = Upper Class        |        2 = Middle Class        |        3 = Lower Class", y = "Count", title = "The Probability of Living Given Socio-Economic Status")
q


f <- ggplot(training, aes(Survived, group = Survived))
f <- f + geom_histogram(aes(fill = Survived), position = "fill", stat = "count")
f <- f + facet_grid(~Pclass)
f <- f + scale_y_continuous(labels = scales::percent)
f <- f + labs(x = "1 = Upper Class        |        2 = Middle Class        |        3 = Lower Class", y = "Count", title = "The Probability of Living Given Socio-Economic Status")
f

它们看起来完全一样,唯一的问题是每个图中的生存/死亡条都等于100%。任何想法如何让每个方面的百分比正确

2 个答案:

答案 0 :(得分:1)

我认为这就是你的目标。要使用构面获得组百分比,请使用geom_bar..prop..,并将构面变量指定为group

f <- ggplot(training, 
             aes(y=Survived, 
                 x=factor(Survived, labels=c("Died","Lived"))))
f <- f + geom_bar(aes(y=..prop.., group=Pclass, 
                      fill=factor(..x.., labels=c("Died","Lived"))))
f <- f + facet_grid(~factor(Pclass, 
                        labels=c("Upper Class", "Middle Class", "Lower Class")))
f <- f + scale_y_continuous(labels = scales::percent) 
f <- f + scale_fill_discrete(name="Survival Status")
f <- f + labs(x="", y = "Percentage", title = "The Probability of Living Given Socio-Economic Status")
f

但是fill参数还有一些问题。上述工作,但我不知道为什么它不会接受Survived,以及为什么你必须像我做的那样重新计算x。

作为旁注,当你有两个百分比加起来为百分之百的条形图时,将它们并排展示可能不是最佳选择。您可能希望将它们堆叠起来以更清楚地显示比例。

答案 1 :(得分:0)

我不确定你的“y = ..prop ..”这个论点。下面的代码提前计算生存率和死亡率,他们的情节很好。

library(tidyverse)

training %>% 
  group_by(Pclass) %>% 
  summarise(
    survival_rate = mean(Survived),
    death_rate = 1 - survival_rate
  ) %>% 
  gather(survival_rate, death_rate, key = rate_type, value = rate) %>% 
  ggplot(., aes(x = rate_type, y = rate, fill = rate_type)) + 
  geom_col(position = "dodge") + 
  facet_grid(~Pclass, labeller = as_labeller(c(
    "1" = "First Class", "2" = "Second Class", "3" = "Third Class"))
  ) + 
  scale_y_continuous(labels = scales::percent) + 
  labs(x = NULL, 
       y = "Survival Rate", 
       title = "The Probability of Living Given Socio-Economic Status")