我有一个以下格式的数据框:
Site Year Month Count1 Count2 Count3 Patch
1 1 May 15 12 10 1
1 1 May 8 0 5 2
1 1 May 3 1 2 3
1 1 May 4 4 1 4
1 1 June 6 5 1 1
1 1 June 9 1 3 2
1 1 June 3 0 0 3
1 1 June 5 5 2 4
1 1 July 4 0 3 1
..........
我希望在补丁级别上折叠数据框,以便对三个计数变量求和。即。
Site Year Month Count1 Count2 Count3
1 1 May 30 17 18
1 1 June 23 11 6
1 1 July 4 0 3
.........
我已经查看了aggregate和tapply命令,但它们似乎并不是根据需要在补丁中求和。
有人可以建议相应地转换数据的命令。
谢谢。
答案 0 :(得分:3)
library(dplyr)
dat %>%
group_by(Site, Year, Month) %>%
summarise_each(funs(sum=sum(., na.rm=TRUE)), Count1:Count3)
# Source: local data frame [3 x 6]
#Groups: Site, Year
# Site Year Month Count1 Count2 Count3
# 1 1 1 July 4 0 3
# 2 1 1 June 23 11 6
# 3 1 1 May 30 17 18
答案 1 :(得分:3)
或data.table
解决方案(将按原始月份顺序对数据进行排序)
library(data.table)
setDT(df)[, lapply(.SD, sum),
by = list(Site, Year, Month),
.SDcols = paste0("Count", seq_len(3))]
# Site Year Month Count1 Count2 Count3
# 1: 1 1 May 30 17 18
# 2: 1 1 June 23 11 6
# 3: 1 1 July 4 0 3
答案 2 :(得分:2)
使用聚合:
> ( a <- aggregate(.~Site+Year+Month, dat[-length(dat)], sum) )
# Site Year Month Count1 Count2 Count3
# 1 1 1 July 4 0 3
# 2 1 1 June 23 11 6
# 3 1 1 May 30 17 18
dat
是您的数据。
请注意,您在7月份发布的帖子似乎不正确。
对于原始数据顺序的结果,您可以使用
> a[order(as.character(unique(dat$Month))), ]
# Site Year Month Count1 Count2 Count3
# 3 1 1 May 30 17 18
# 2 1 1 June 23 11 6
# 1 1 1 July 4 0 3