假设我有一个类似于以下数据集的数据集:
| id | Date | Name | Amount |
|-----|:---------:|------:|--------|
| 400 | 1/29/2019 | Chris | 3777 |
| 400 | 1/29/2019 | Chris | 4013 |
| 45 | 4/4/2019 | Patty | 3010 |
| 45 | 4/4/2019 | Patty | 2050 |
我需要获取每个名称,ID和日期(基本上按名称,ID和日期分组)的总和大于5000的那些记录。在上面的数据集中,id为400的2019年1月29日的``克里斯''和id为45的'Patty'的总和大于5000(3777 + 4013> 5000和3010 + 2050> 5000 )。因此,我的输出应为:
df <- data.frame(id = c("59","59","59","45","45","400","400","45","45"),
Date = c("11/29/2018","11/29/2018","11/29/2018","2/13/2019","2/13/2019",
"1/29/2019","1/29/2019","4/4/2019","4/4/2019"),
Name = c("Chang", "Chang", "Chang", "Lin", "Lin", "Chris", "Chris", "Patty", "Patty"),
Amount = c("958","1158","595","922","922","3777","4013","3010","2050"), stringsAsFactors = F)
df$Date <- as.Date(df$Date, '%m/%d/%Y')
df$Amount <- as.numeric(df$Amount)
df_sum <- aggregate(df$Amount,
by = list(Name = df$Name,
Id = df$id,
Date = df$Date),
FUN = sum) %>%
arrange(Name, Id, Date)
df_sum <- subset(df_sum, df_sum$x >= 5000)
我使用了聚合函数,但是它给出了聚合值。但是,我正在寻找一种提取记录的方法,该记录添加时大于5000,而不是合计值。下面是我尝试过的代码:
nav {
display: flex;
justify-content: center;
align-items: center;
}
答案 0 :(得分:1)
一种选择是根据大于50的“金额”的filter
对行进行'id','Name'和sum
分组
library(dplyr)
df %>%
group_by(id, Name) %>%
filter(sum(Amount) > 5000)
在base R
中,可以使用ave
df[with(df, ave(Amount, id, Name, FUN = sum) >5000),]
# id Date Name Amount
#6 400 2019-01-29 Chris 3777
#7 400 2019-01-29 Chris 4013
#8 45 2019-04-04 Patty 3010
#9 45 2019-04-04 Patty 2050