根据条件和组对值求和

时间:2019-09-17 19:11:10

标签: r group-by sum aggregate

假设我有一个类似于以下数据集的数据集:

| id  |    Date   |  Name | Amount |
|-----|:---------:|------:|--------|
| 400 | 1/29/2019 | Chris | 3777   |
| 400 | 1/29/2019 | Chris | 4013   |
| 45  | 4/4/2019  | Patty | 3010   |
| 45  | 4/4/2019  | Patty | 2050   |

我需要获取每个名称,ID和日期(基本上按名称,ID和日期分组)的总和大于5000的那些记录。在上面的数据集中,id为400的2019年1月29日的``克里斯''和id为45的'Patty'的总和大于5000(3777 + 4013> 5000和3010 + 2050> 5000 )。因此,我的输出应为:

df <- data.frame(id = c("59","59","59","45","45","400","400","45","45"),
                 Date = c("11/29/2018","11/29/2018","11/29/2018","2/13/2019","2/13/2019",
                        "1/29/2019","1/29/2019","4/4/2019","4/4/2019"), 
                 Name = c("Chang", "Chang", "Chang", "Lin", "Lin", "Chris", "Chris", "Patty", "Patty"), 
                 Amount = c("958","1158","595","922","922","3777","4013","3010","2050"), stringsAsFactors = F)  
df$Date <- as.Date(df$Date, '%m/%d/%Y')
df$Amount <- as.numeric(df$Amount)

df_sum <- aggregate(df$Amount, 
                    by = list(Name = df$Name, 
                              Id = df$id, 
                              Date = df$Date), 
                    FUN = sum) %>% 
             arrange(Name, Id, Date)
df_sum <- subset(df_sum, df_sum$x >= 5000)

我使用了聚合函数,但是它给出了聚合值。但是,我正在寻找一种提取记录的方法,该记录添加时大于5000,而不是合计值。下面是我尝试过的代码:

nav {
  display: flex;
  justify-content: center;
  align-items: center;
}

1 个答案:

答案 0 :(得分:1)

一种选择是根据大于50的“金额”的filter对行进行'id','Name'和sum分组

library(dplyr)
df %>% 
    group_by(id, Name) %>%
    filter(sum(Amount) > 5000)

base R中,可以使用ave

df[with(df, ave(Amount, id, Name, FUN = sum) >5000),]
#   id       Date  Name Amount
#6 400 2019-01-29 Chris   3777
#7 400 2019-01-29 Chris   4013
#8  45 2019-04-04 Patty   3010
#9  45 2019-04-04 Patty   2050