Question

在 R 中，我有一个包含Location，sample_year和count的表。所以，

Location sample_year count  
A        1995        1
A        1995        1  
A        2000        3  
B        2000        1  
B        2000        1  
B        2000        5

我想要一个汇总表来检查'Location'和'sample_year'列，并根据这个独特的组合而不是单个列来汇总'count'。所以，最终结果应该是：

Location sample_year sum_count
A        1995        2
A        2000        3
B        2000        7

我可以将列和数据合并到一个新列中以创建唯一的Location-sample_year，但这不是一个干净的解决方案，特别是如果我需要在某个时刻将其扩展到三列。必须有一个更好的方法。

Answer 1

您可以将aggregate与公式一起使用。

首先是数据：

x <- read.table(textConnection("Location sample_year count  
A        1995        1
A        1995        1  
A        2000        3  
B        2000        1  
B        2000        1  
B        2000        5"), header = TRUE)

使用sum与指定分组的公式汇总：

aggregate(count ~ Location+sample_year, data = x, sum)
    Location sample_year count
1        A        1995     2
2        A        2000     3
3        B        2000     7

Answer 2

或使用reshape包：

library(reshape)
md <- melt(x, measure.vars = "count")
cast(md, Location + sample_year ~ variable, sum)
  Location sample_year count
1        A        1995     2
2        A        2000     3
3        B        2000     7

修改的

我使用@ mdsumner的答案中的对象x。无论如何...我建议你坚持他的答案，因为它不依赖于外部包（aggregate函数与R捆绑在一起，除非你分离stats包......）。而且，顺便说一下，它比reshape解决方案更快。

Answer 3

或使用plyr（使用@mdsummer中的x）

library(plyr)
ddply(x, .(Location,sample_year), summarise, count = sum(count))

tapply（）函数依赖于R中的多个列

3 个答案: