使用dplyr按多个组进行汇总

时间:2016-04-27 15:08:57

标签: r

我正在尝试使用dplyr来总结基于2组的数据集:“年”和“区域”。这就是数据集的样子:

  Year   Area Num
1 2000 Area 1  99
2 2001 Area 3  85
3 2000 Area 1  60
4 2003 Area 2  90
5 2002 Area 1  40
6 2002 Area 3  30
7 2004 Area 4  10
...

最终结果应如下所示:

  Year    Area Mean
1 2000 Area 1  100
2 2000 Area 2   80
3 2000 Area 3   89
4 2001 Area 1   80
5 2001 Area 2   85
6 2001 Area 3   59
7 2002 Area 1   90
8 2002 Area 2   88
... 

请原谅“均值”的值,它们已经弥补。

示例数据集的代码:

df <- structure(list(
   Year = c(2000, 2001, 2000, 2003, 2002, 2002, 2004), 
   Area = structure(c(1L, 3L, 1L, 2L, 1L, 3L, 4L), 
   .Label = c("Area 1", "Area 2", "Area 3", "Area 4"), 
   class = "factor"), 
   Num = structure(c(7L, 5L, 4L, 6L, 3L, 2L, 1L), 
   .Label = c("10", "30", "40", "60", "85", "90", "99"), 
   class = "factor")), 
   .Names = c("Year", "Area", "Num"), 
   class = "data.frame", row.names = c(NA, -7L))

df$Num <- as.numeric(df$Num)

我尝试过的事情:

df.meanYear <- df %>%
  group_by(Year) %>%
  group_by(Area) %>%
  summarize_each(funs(mean(Num)))

但它只是用平均值替换每个值,而不是预期的结果。

如果可能的话,请提供替代方法(即非dplyr)方法,因为我还是R的新手。

2 个答案:

答案 0 :(得分:6)

这是你在找什么?

 library(dplyr)
 df <- group_by(df, Year, Area)
 df <- summarise(df, avg = mean(Num))

答案 1 :(得分:0)

我们可以使用data.table

library(data.table)
setDT(df)[, .(avg = mean(Num)) , by = .(Year, Area)]