按多个因素分组并汇总因素计数

时间:2019-05-02 18:58:47

标签: r group-by dplyr factors summarize

我有一堆分类飞船的“类型”数据,例如一年中不同月份在离岸距离(DOS,例如0-12 nm,0-25 nm等)不同距离内的乘客,捕鱼,货物等。

最初,我想获取Type的数量,例如乘客,则为整个DOS /全年/数据集运送每个DOS。然后,我想在一年中的每个月都这样做。

我想这将是某种group_by函数,后跟摘要?但是我花很多时间尝试获取输出,因为使用dplyr还不够好。

我尝试过的一些事情:

ships <- df %>% group_by(DOS, Type)
shipc <- summarize(ships, count = n())

df1 <- gather(df, Type, DOS) %>% count(Type, DOS) %>% spread(DOS, n, fill = 0)

但是我很确定它无法正常工作,因为我不正确地理解语法。...

以下是一些虚拟数据:

df <- structure(list(Type = c("Cargo ship", "Cargo ship", "Cargo ship", 
"Cargo ship", "Cargo ship", "Cargo ship", "Fishing", "Fishing", 
 "Fishing", "Fishing", "Fishing", "Cargo ship", "Cargo ship", 
 "Cargo ship", "Cargo ship", "Cargo ship", "Fishing", "Fishing", 
"Fishing", "Fishing", "Fishing", "Fishing", "Fishing", "Fishing", 
"Fishing", "Cargo ship:DG,HS,MP(A)", "Cargo ship", "Cargo ship", 
"Fishing", "Fishing", "Fishing", "Fishing", "Fishing", "Tanker", 
 "Cargo ship", "Cargo ship", "Fishing", "Fishing", "Cargo 
 ship:DG,HS,MP(A)", 
 "Cargo ship:DG,HS,MP(D)", "Cargo ship:DG,HS,MP(D)", "Cargo 
 ship:DG,HS,MP(D)", 
 "Cargo ship"), DOS = c("0-100", "0-50", "0-25", "0-100", "0-50", 
 "0-25", "0-100", "0-25", "0-12", "0-50", "0-100", "0-50", "0-100", 
 "0-25", "0-50", "0-100", "0-50", "0-25", "0-50", "0-100", "0-25", 
 "0-100", "0-100", "0-50", "0-25", "0-100", "0-100", "0-50", "0-100", 
 "0-50", "0-25", "0-100", "0-100", "0-100", "0-50", "0-100", "0-100", 
 "0-100", "0-100", "0-25", "0-50", "0-100", "0-100"), Month = c("May", 
 "May", "May", "May", "May", "May", "May", "May", "May", "May", 
 "June", "June", "June", "June", "June", "June", "June", "June", 
 "June", "June", "June", "August", "August", "August", "August", 
 "August", "August", "August", "August", "August", "August", "August", 
 "January", "January", "January", "January", "January", "January", 
 "January", "January", "January", "January", "January"), Year = c(2018, 
 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 
 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 
 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2019, 2019, 
 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019)), row.names = c(NA, 
-43L), class = c("tbl_df", "tbl", "data.frame"))

我要的是Type类别,DOS和属于这些唯一标识符的总舰只数量。然后,我进一步希望按月和年分组。

1 个答案:

答案 0 :(得分:1)

不清楚预期。根据说明,按所有列(group_by_all)分组,将频率计数(n())和spread设为“宽”格式

library(dplyr)
df %>% 
   group_by_all %>% 
   summarise(n = n()) %>% 
   spread(DOS, n, fill = 0)

或使用countgroup_by + summarise)和spread

df %>% 
  dplyr::count(Type, DOS, Month, Year) %>% 
  spread(DOS, n, fill = 0)