我正在尝试使用R中的聚合来汇总一些数据,同时使用以下数据计算其他列的值
newdata
Year HNo County ST Month Day DuckBag GooseBag
2012 264120547 LA ND 10 13 6 0
2008 264080047 EDDY ND 9 27 4 1
2013 26430119 ROLETTE ND 10 20 3 0
2006 264060447 BURKE ND 10 25 5 0
2006 264061113 BENSON ND 10 2 3 1
2012 564120139 OLIVER ND 12 15 0 3
2013 26430294 TOWNER ND 10 10 2 0
2007 564070298 LOGAN ND 9 29 0 0
2007 564070869 SHERIDAN ND 10 21 0 0
2007 564070315 CASS ND 9 2 0 0
2005 264050791 SHERIDAN ND 10 15 3 0
2012 264120240 RAMSEY ND 11 1 6 0
2013 26431021 TOWNER ND 10 20 3 0
2013 56430774 NA ND 10 9 5 2
2006 264061288 BENSON ND 10 4 5 1
2005 264051006 EDDY ND 10 17 5 2
2010 264100848 MORTON ND 10 2 0 0
2011 264110151 CASS ND 10 8 4 1
2005 264051100 WARD ND 10 9 1 0
2013 26430194 MC ND 11 1 5 0
我想在年份和月份上汇总每个组合的DuckBag和GooseBag。另外,我想计算每个年/月组合中有多少行有DuckBag或GooseBag> 0.
我可以接近这些代码,但不是我想要的。
aggregate(newdata$DuckBag,list(Year = newdata$Year, Month = newdata$Month),sum)
aggregate(DuckBag ~ Year+Month,data = newdata,FUN=function(newdata) c(total =sum(newdata), n=length(newdata) ) )
dplyr会更好吗?我看过的dplyr代码看起来更干净,但不知道从哪里开始计数。最后,虽然我确定它的要求太高,但有没有办法可以添加一个专栏,该专栏给出了该年度/月份总和所代表的相应年份总和的比例?非常感谢你。
答案 0 :(得分:6)
喜欢这样吗?
group_by(df, Year, Month) %>%
summarise_each(funs(Sum = sum(.), Positive = sum(. > 0)), DuckBag, GooseBag)
#Source: local data frame [12 x 6]
#Groups: Year
#
# Year Month DuckBag_Sum GooseBag_Sum DuckBag_Positive GooseBag_Positive
#1 2005 10 9 2 3 1
#2 2006 10 13 2 3 2
#3 2007 9 0 0 0 0
#4 2007 10 0 0 0 0
#5 2008 9 4 1 1 1
#6 2010 10 0 0 0 0
#7 2011 10 4 1 1 1
#8 2012 10 6 0 1 0
#9 2012 11 6 0 1 0
#10 2012 12 0 3 0 1
#11 2013 10 13 2 4 1
#12 2013 11 5 0 1 0
答案 1 :(得分:3)
您也可以在aggregate
中一步完成此操作。
f1 <- function(x) c(Sum=sum(x), Positive=sum(x > 0)) #(just to make it clean)
res <- do.call(data.frame,aggregate(cbind(DuckBag,GooseBag)~Year+
Month, df, FUN=f1))
res
# Year Month DuckBag.Sum DuckBag.Positive GooseBag.Sum GooseBag.Positive
#1 2007 9 0 0 0 0
#2 2008 9 4 1 1 1
#3 2005 10 9 3 2 1
#4 2006 10 13 3 2 2
#5 2007 10 0 0 0 0
#6 2010 10 0 0 0 0
#7 2011 10 4 1 1 1
#8 2012 10 6 1 0 0
#9 2013 10 13 4 2 1
#10 2012 11 6 1 0 0
#11 2013 11 5 1 0 0
#12 2012 12 0 0 3 1
答案 2 :(得分:2)
这是我的看法:
library(dplyr)
results <- df %>%
group_by(Year, Month) %>%
summarise(Duck.Bag.Total = sum(DuckBag),
Goose.Bag.Total = sum(GooseBag),
Total.Sum = sum(Duck.Bag.Total, Goose.Bag.Total)) %>%
mutate(Duck.or.Goose.Positive = Duck.Bag.Total > 0 | Goose.Bag.Total > 0)
results
# Year Month Duck.Bag.Total Goose.Bag.Total Total.Sum Duck.or.Goose.Positive
# 1 2005 10 9 2 11 TRUE
# 2 2006 10 13 2 15 TRUE
# 3 2007 9 0 0 0 FALSE
# 4 2007 10 0 0 0 FALSE
# 5 2008 9 4 1 5 TRUE
# 6 2010 10 0 0 0 FALSE
# 7 2011 10 4 1 5 TRUE
# 8 2012 10 6 0 6 TRUE
# 9 2012 11 6 0 6 TRUE
# 10 2012 12 0 3 3 TRUE
# 11 2013 10 13 2 15 TRUE
# 12 2013 11 5 0 5 TRUE
第二部分:
results2 <- results %>%
group_by(Year) %>%
summarise(Total.for.Year = sum(Total.Sum)) %>%
mutate(prop = Total.for.Year / sum(Total.for.Year))
results2
# Year Total.for.Year prop
# 1 2005 11 0.15492958
# 2 2006 15 0.21126761
# 3 2007 0 0.00000000
# 4 2008 5 0.07042254
# 5 2010 0 0.00000000
# 6 2011 5 0.07042254
# 7 2012 15 0.21126761
# 8 2013 20 0.28169014