如何根据2个不同组中的类别对R中的列进行求和

时间:2015-06-11 17:31:01

标签: r sorting

我从excel通过" readxl :: read_excel"导入了以下数据集。命令:

Municipality    Production  Type
Atima           690         Reverification
Atima           120         Reverification
Atima           220         Reverification
Comayagua       153         Initial
Comayagua       193         Initial
Comayagua       138         Initial
Comayagua       307         Reverification
Copán           179         Initial
Copán           100         Initial
Copán           236         Reverification
Copán           141         Reverification
Danlí            56         Reverification
...

随后使用下面的代码作为数据是tbl_df。

df <- as.data.frame(df) 

我想按类型对数据进行排序,并将生产加起来以获得每个市的总生产量:

Municipality    Production  Type
Atima           1030        Reverification
Comayagua       484         Initial
Comayagua       307         Reverification
Copán           279         Initial
Copán           377         Reverification
Danlí            56         Reverification

我在其他帖子中查了一下,但我只能找到如何总结一个分类变量。我怎么能在R中这样做?或者我应该先在Excel中执行此操作然后导入表格?

我在Windows 7中的Rstudio版本0.99.441中工作。

提前感谢您的帮助。

2 个答案:

答案 0 :(得分:1)

使用其中一个聚合函数

library(data.table)
 setDT(df1)[,list(Production=sum(Production)) , .(Municipality,Type)]
 #    Municipality           Type Production
 # 1:        Atima Reverification       1030
 # 2:    Comayagua        Initial        484
 # 3:    Comayagua Reverification        307
 # 4:        Copán        Initial        279
 # 5:        Copán Reverification        377
 # 6:        Danlí Reverification         56

res <- aggregate(Production~., df1, FUN=sum)
res1 <- res[with(res, order(Municipality,-Production)),]
row.names(res1) <- NULL
res1
#  Municipality           Type Production
#1        Atima Reverification       1030
#2    Comayagua        Initial        484
#3    Comayagua Reverification        307
#4        Copán Reverification        377
#5        Copán        Initial        279
#6        Danlí Reverification         56

数据

 df1 <- structure(list(Municipality = c("Atima", "Atima", "Atima", "Comayagua", 
"Comayagua", "Comayagua", "Comayagua", "Copán", "Copán", "Copán", 
"Copán", "Danlí"), Production = c(690L, 120L, 220L, 153L, 193L, 
138L, 307L, 179L, 100L, 236L, 141L, 56L), Type = c("Reverification", 
"Reverification", "Reverification", "Initial", "Initial", "Initial", 
"Reverification", "Initial", "Initial", "Reverification", "Reverification", 
"Reverification")), .Names = c("Municipality", "Production", 
"Type"), class = "data.frame", row.names = c(NA, -12L))

答案 1 :(得分:1)

这是使用dplyr执行此操作的方法:

require(dplyr)
df %>%
  group_by(Municipality, Type) %>%
  summarize(Production=sum(Production))