我从excel通过" readxl :: read_excel"导入了以下数据集。命令:
Municipality Production Type
Atima 690 Reverification
Atima 120 Reverification
Atima 220 Reverification
Comayagua 153 Initial
Comayagua 193 Initial
Comayagua 138 Initial
Comayagua 307 Reverification
Copán 179 Initial
Copán 100 Initial
Copán 236 Reverification
Copán 141 Reverification
Danlí 56 Reverification
...
随后使用下面的代码作为数据是tbl_df。
df <- as.data.frame(df)
我想按类型对数据进行排序,并将生产加起来以获得每个市的总生产量:
Municipality Production Type
Atima 1030 Reverification
Comayagua 484 Initial
Comayagua 307 Reverification
Copán 279 Initial
Copán 377 Reverification
Danlí 56 Reverification
我在其他帖子中查了一下,但我只能找到如何总结一个分类变量。我怎么能在R中这样做?或者我应该先在Excel中执行此操作然后导入表格?
我在Windows 7中的Rstudio版本0.99.441中工作。
提前感谢您的帮助。
答案 0 :(得分:1)
使用其中一个聚合函数
library(data.table)
setDT(df1)[,list(Production=sum(Production)) , .(Municipality,Type)]
# Municipality Type Production
# 1: Atima Reverification 1030
# 2: Comayagua Initial 484
# 3: Comayagua Reverification 307
# 4: Copán Initial 279
# 5: Copán Reverification 377
# 6: Danlí Reverification 56
或
res <- aggregate(Production~., df1, FUN=sum)
res1 <- res[with(res, order(Municipality,-Production)),]
row.names(res1) <- NULL
res1
# Municipality Type Production
#1 Atima Reverification 1030
#2 Comayagua Initial 484
#3 Comayagua Reverification 307
#4 Copán Reverification 377
#5 Copán Initial 279
#6 Danlí Reverification 56
df1 <- structure(list(Municipality = c("Atima", "Atima", "Atima", "Comayagua",
"Comayagua", "Comayagua", "Comayagua", "Copán", "Copán", "Copán",
"Copán", "Danlí"), Production = c(690L, 120L, 220L, 153L, 193L,
138L, 307L, 179L, 100L, 236L, 141L, 56L), Type = c("Reverification",
"Reverification", "Reverification", "Initial", "Initial", "Initial",
"Reverification", "Initial", "Initial", "Reverification", "Reverification",
"Reverification")), .Names = c("Municipality", "Production",
"Type"), class = "data.frame", row.names = c(NA, -12L))
答案 1 :(得分:1)
这是使用dplyr执行此操作的方法:
require(dplyr)
df %>%
group_by(Municipality, Type) %>%
summarize(Production=sum(Production))