这是我的问题here:
的后续问题这是我的交易数据
data
id from to date amount
<int> <fctr> <fctr> <date> <dbl>
19521 6644 6934 2005-01-01 700.0
19524 6753 8456 2005-01-01 600.0
19523 9242 9333 2005-01-01 1000.0
… … … … …
1055597 9866 9736 2010-12-31 278.9
1053519 9868 8644 2010-12-31 242.8
1052790 9869 8399 2010-12-31 372.2
现在,对于from
列中的每个帐户,我想计算在进行交易时,他们在最近6个月内收到了多少交易金额。为此:
df <- data # df is just a copy of "data"
setDT(df)[, total_trx_amount_received_in_last_6month := sapply(date, function(x)
sum(amount[between(date, x-180, x)])), to]
# since I want to merge "df" and "data" based on the columns "from" and "date", I change the name of the column "to" and make it "from"
df <- select(df, to,date,total_trx_amount_received_in_last_6month) %>% rename(from=to)
df
from date total_trx_amount_received_in_last_6month
<fctr> <date> <dbl>
7468 2005-01-04 700.0
6213 2005-01-08 12032.0
7517 2005-01-10 1000.0
6143 2005-01-12 4976.0
6254 2005-01-14 200.0
6669 2005-01-20 200.0
6934 2005-01-24 72160.0
9240 2005-01-26 21061.0
6374 2005-01-30 1000.0
6143 2005-01-31 4989.4
现在,我想将此新列total_trx_amount_received_in_last_6month
添加到原始data
中。因此,我应该通过列data
和df
合并这两个数据帧from
和date
,但是日期的匹配条件是一个值的范围,而不是一个值。例如,对于帐户7468
,如果原始data
包含一笔交易7468
并且交易日期落入"2004-07-08"-"2005-01-04"
的间隔(从最近的6个月开始, "2005-01-04"
中的值,然后应将700.0
中的相应值df$total_trx_amount_received_in_last_6month
添加到data$total_trx_amount_received_in_last_6month
我该怎么做?
答案 0 :(得分:0)
没有足够的数据来对此进行测试,但是您可以将两个数据框和replace
total_trx_amount_received_in_last_6month
结合起来,其中两个日期之间的差值到NA
大于180天。 / p>
library(dplyr)
data %>%
left_join(df, by = 'from') %>%
mutate(total_trx_amount_received_in_last_6month = replace(
total_trx_amount_received_in_last_6month,
(date.y - date.x) > 180, NA))
使用data.table
,您可以执行以下操作:
library(data.table)
setDT(data)
df1 <- df[data, on = 'from']
df1[, total_trx_amount_received_in_last_6month := replace(
total_trx_amount_received_in_last_6month,
(date - i.date) > 180, NA)]