在r中一步汇总并计算

时间:2014-12-19 20:44:43

标签: r dplyr

我正在尝试使用R中的聚合来汇总一些数据,同时使用以下数据计算其他列的值

newdata

Year    HNo County  ST  Month   Day DuckBag GooseBag
2012    264120547   LA  ND  10  13  6   0
2008    264080047   EDDY    ND  9   27  4   1
2013    26430119    ROLETTE ND  10  20  3   0
2006    264060447   BURKE   ND  10  25  5   0
2006    264061113   BENSON  ND  10  2   3   1
2012    564120139   OLIVER  ND  12  15  0   3
2013    26430294    TOWNER  ND  10  10  2   0
2007    564070298   LOGAN   ND  9   29  0   0
2007    564070869   SHERIDAN    ND  10  21  0   0
2007    564070315   CASS    ND  9   2   0   0
2005    264050791   SHERIDAN    ND  10  15  3   0
2012    264120240   RAMSEY  ND  11  1   6   0
2013    26431021    TOWNER  ND  10  20  3   0
2013    56430774    NA  ND  10  9   5   2
2006    264061288   BENSON  ND  10  4   5   1
2005    264051006   EDDY    ND  10  17  5   2
2010    264100848   MORTON  ND  10  2   0   0
2011    264110151   CASS    ND  10  8   4   1
2005    264051100   WARD    ND  10  9   1   0
2013    26430194    MC  ND  11  1   5   0

我想在年份和月份上汇总每个组合的DuckBag和GooseBag。另外,我想计算每个年/月组合中有多少行有DuckBag或GooseBag> 0.

我可以接近这些代码,但不是我想要的。

aggregate(newdata$DuckBag,list(Year = newdata$Year, Month = newdata$Month),sum)
aggregate(DuckBag ~ Year+Month,data = newdata,FUN=function(newdata) c(total =sum(newdata), n=length(newdata) ) )

dplyr会更好吗?我看过的dplyr代码看起来更干净,但不知道从哪里开始计数。最后,虽然我确定它的要求太高,但有没有办法可以添加一个专栏,该专栏给出了该年度/月份总和所代表的相应年份总和的比例?非常感谢你。

3 个答案:

答案 0 :(得分:6)

喜欢这样吗?

group_by(df, Year, Month) %>%
  summarise_each(funs(Sum = sum(.), Positive = sum(. > 0)), DuckBag, GooseBag)
#Source: local data frame [12 x 6]
#Groups: Year
#
#   Year Month DuckBag_Sum GooseBag_Sum DuckBag_Positive GooseBag_Positive
#1  2005    10           9            2                3                 1
#2  2006    10          13            2                3                 2
#3  2007     9           0            0                0                 0
#4  2007    10           0            0                0                 0
#5  2008     9           4            1                1                 1
#6  2010    10           0            0                0                 0
#7  2011    10           4            1                1                 1
#8  2012    10           6            0                1                 0
#9  2012    11           6            0                1                 0
#10 2012    12           0            3                0                 1
#11 2013    10          13            2                4                 1
#12 2013    11           5            0                1                 0

答案 1 :(得分:3)

您也可以在aggregate中一步完成此操作。

f1 <- function(x) c(Sum=sum(x), Positive=sum(x > 0)) #(just to make it clean)
res <- do.call(data.frame,aggregate(cbind(DuckBag,GooseBag)~Year+
                            Month, df, FUN=f1))
res
#   Year Month DuckBag.Sum DuckBag.Positive GooseBag.Sum GooseBag.Positive
#1  2007     9           0                0            0                 0
#2  2008     9           4                1            1                 1
#3  2005    10           9                3            2                 1
#4  2006    10          13                3            2                 2
#5  2007    10           0                0            0                 0
#6  2010    10           0                0            0                 0
#7  2011    10           4                1            1                 1
#8  2012    10           6                1            0                 0
#9  2013    10          13                4            2                 1
#10 2012    11           6                1            0                 0
#11 2013    11           5                1            0                 0
#12 2012    12           0                0            3                 1

答案 2 :(得分:2)

这是我的看法:

library(dplyr)
results <- df %>%
              group_by(Year, Month) %>%
              summarise(Duck.Bag.Total  = sum(DuckBag),
                        Goose.Bag.Total = sum(GooseBag),
                        Total.Sum       = sum(Duck.Bag.Total, Goose.Bag.Total)) %>%
              mutate(Duck.or.Goose.Positive = Duck.Bag.Total > 0 | Goose.Bag.Total > 0)

results
#    Year Month Duck.Bag.Total Goose.Bag.Total Total.Sum Duck.or.Goose.Positive
# 1  2005    10              9               2        11                   TRUE
# 2  2006    10             13               2        15                   TRUE
# 3  2007     9              0               0         0                  FALSE
# 4  2007    10              0               0         0                  FALSE
# 5  2008     9              4               1         5                   TRUE
# 6  2010    10              0               0         0                  FALSE
# 7  2011    10              4               1         5                   TRUE
# 8  2012    10              6               0         6                   TRUE
# 9  2012    11              6               0         6                   TRUE
# 10 2012    12              0               3         3                   TRUE
# 11 2013    10             13               2        15                   TRUE
# 12 2013    11              5               0         5                   TRUE

第二部分:

results2 <- results %>%
              group_by(Year) %>%
              summarise(Total.for.Year = sum(Total.Sum)) %>%
              mutate(prop = Total.for.Year / sum(Total.for.Year))

results2
#   Year Total.for.Year       prop
# 1 2005             11 0.15492958
# 2 2006             15 0.21126761
# 3 2007              0 0.00000000
# 4 2008              5 0.07042254
# 5 2010              0 0.00000000
# 6 2011              5 0.07042254
# 7 2012             15 0.21126761
# 8 2013             20 0.28169014