r-使用Dplyr计算子组内的%

时间:2018-12-24 10:58:42

标签: r ggplot2 dplyr

我想按年份绘制各种事件类型的相对死亡人数。

我可以处理ggplot中的各个方面,但是正在努力根据事件,年份和死亡人数来计算事件百分比。

Event Type Year  Fatalities  % by Event 
                             (calculated)
-----      ----  ----------  ---------- 
Storm      1980           5  12.5%
Storm      1981           9  22.5%
Storm      1982          15  37.5%
Storm      1983          11  27.5%
Ice        1980           7  70%
Ice        1981           3  30%

我有以下代码来计算它,但是该计算不适用于使用更高分母的%。

fatalitiesByYearType <- stormDF %>% 
    group_by(eventType) %>% 
    mutate(totalEventFatalities = sum(FATALITIES)) %>%
    group_by(year, add = TRUE) %>% 
    mutate(fatalitiesPct =  sum(FATALITIES) / totalEventFatalities)

我在做什么错了?

我的图表如下。我之所以包括这个,是因为我也很想看看是否有一种方法可以在ggplot中按比例显示数据。

p <- ggplot(data = fatalitiesByYearType,
    aes(x=factor(year),y=fatalitiesPct)) 
p + geom_bar(stat="identity") +
    facet_wrap(.~eventType, nrow = 5) +
    labs(x = "Year", 
         y = "Fatalities",
         title = "Fatalities by Type")

1 个答案:

答案 0 :(得分:1)

也许我不明白您的问题,但是我们可以从这里开始:

library(dplyr)
library(ggplot2)

# here the dplyr part
  dats <- fatalitiesByYearType %>%
          group_by(eventType) %>% 
          mutate(totalEventFatalities = sum(FATALITIES)) %>%
          group_by(year, add = TRUE) %>% 
          # here we add the summarise
          summarise(fatalitiesPct =  sum(FATALITIES) / totalEventFatalities)     
     dats
# A tibble: 6 x 3
# Groups:   eventType [?]
  eventType  year fatalitiesPct
  <fct>     <int>         <dbl>
1 Ice        1980         0.7  
2 Ice        1981         0.3  
3 Storm      1980         0.125
4 Storm      1981         0.225
5 Storm      1982         0.375
6 Storm      1983         0.275

您可以清楚地将所有内容合并到一个独特的dplyr链中:

# here the ggplot2 part     
    p <- ggplot(dats,aes(x=factor(year),y=fatalitiesPct)) + 
         geom_bar(stat="identity") +
         facet_wrap(.~eventType, nrow = 5) +
         labs(x = "Year", y = "Fatalities", title = "Fatalities by Type") +
         # here we add the % in the plot
         scale_y_continuous(labels = scales::percent)  

enter image description here


有数据:

fatalitiesByYearType <- read.table(text = "eventType year  FATALITIES  
                                   Storm      1980           5  
                                   Storm      1981           9  
                                   Storm      1982          15  
                                   Storm      1983          11  
                                   Ice        1980           7  
                                   Ice        1981           3  ",header = T)