我在R中有这个数据框
raw_payment_id from_bank_account amount posted_at
<int> <chr> <dbl> <date>
1 620691 SK660900000000062087 20.0 2018-02-25
2 618433 SK660900000000062087 10.0 2018-02-27
3 623157 SK660900000000062087 10.0 2018-03-02
4 628236 SK300900000000506871 812. 2018-03-06
5 627899 SK300900000000506871 812. 2018-03-07
6 628966 SK660900000000062087 10.0 2018-03-09
我的目标是确定是否在3天内发布了来自同一帐户且金额相同的付款。如果是,则将两个付款都标记为1。这样就可以了。
raw_payment_id from_bank_account amount posted_at test
<int> <chr> <dbl> <date> <int>
1 620691 SK660900000000062087 20.0 2018-02-25 0
2 618433 SK660900000000062087 10.0 2018-02-27 1
3 623157 SK660900000000062087 10.0 2018-03-02 1
4 628236 SK300900000000506871 812. 2018-03-06 1
5 627899 SK300900000000506871 812. 2018-03-07 1
6 628966 SK660900000000062087 10.0 2018-03-09 0
我找不到方法,我的滞后/超前尝试失败了,因为银行帐户可能只有一笔付款。
答案 0 :(得分:1)
library(dplyr)
df %>%
group_by(from_bank_account, amount) %>%
mutate(var = case_when(abs(as.Date(posted_at) - as.Date(lag(posted_at))) < 4 ~ 1,
abs(as.Date(posted_at) - as.Date(lead(posted_at))) < 4 ~ 1,
TRUE ~ 0))
raw_payment_id from_bank_account amount posted_at var
<int> <fct> <dbl> <fct> <dbl>
1 620691 SK660900000000062087 20. 2018-02-25 0.
2 618433 SK660900000000062087 10. 2018-02-27 1.
3 623157 SK660900000000062087 10. 2018-03-02 1.
4 628236 SK300900000000506871 812. 2018-03-06 1.
5 627899 SK300900000000506871 812. 2018-03-07 1.
6 628966 SK660900000000062087 10. 2018-03-09 0.
答案 1 :(得分:0)
library(dplyr)
# Within each accounts, how many transactions were the same amount
tmp <- mydat %>%
group_by(from_bank_account, amount) %>%
mutate(number_of_dupes = n()) %>%
filter(number_of_dupes > 1) # only keep duplicates
# remove dups > 3 days apart
tmp$dup <- 0
for(i in 1:nrow(tmp)){
acct <- tmp$from_bank_account[i]
n <- tmp$number_of_dupes[i]
if(length(tmp$dup[(abs(difftime(tmp$posted_at[i],tmp$posted_at,units = "days")) < 4)
& (tmp$from_bank_account == acct)]) > 1){
tmp$dup[i] <- 1
}
}
tmp <- tmp[tmp$dup==1,]
mydat$flag_duplicate <- ifelse(mydat$raw_payment_id %in% tmp$raw_payment_id,1,0)
raw_payment_id from_bank_account amount posted_at flag_duplicate 1 620691 SK660900000000062087 20 2018-02-25 0 2 618433 SK660900000000062087 10 2018-02-27 1 3 623157 SK660900000000062087 10 2018-03-02 1 4 628236 SK300900000000506871 812 2018-03-06 1 5 627899 SK300900000000506871 812 2018-03-07 1 6 628966 SK660900000000062087 10 2018-03-09 0