如何根据条件合并两个数据框?

时间:2020-09-02 00:46:17

标签: r dataframe date aggregation

这是我的问题here:

的后续问题

这是我的交易数据

data 

id          from    to          date        amount  
<int>       <fctr>  <fctr>      <date>      <dbl>
19521       6644    6934        2005-01-01  700.0
19524       6753    8456        2005-01-01  600.0
19523       9242    9333        2005-01-01  1000.0
…           …       …           …           …
1055597     9866    9736        2010-12-31  278.9
1053519     9868    8644        2010-12-31  242.8
1052790     9869    8399        2010-12-31  372.2

现在,对于from列中的每个帐户,我想计算在进行交易时,他们在最近6个月内收到了多少交易金额。为此:

df <- data # df is just a copy of "data"
setDT(df)[, total_trx_amount_received_in_last_6month := sapply(date, function(x) 
                         sum(amount[between(date, x-180, x)])), to] 

# since I want to merge "df" and "data" based on the columns "from" and "date", I change the name of the column "to" and make it "from"
df <- select(df, to,date,total_trx_amount_received_in_last_6month) %>% rename(from=to)

df

from    date        total_trx_amount_received_in_last_6month
<fctr>  <date>      <dbl>
7468    2005-01-04  700.0       
6213    2005-01-08  12032.0     
7517    2005-01-10  1000.0      
6143    2005-01-12  4976.0      
6254    2005-01-14  200.0       
6669    2005-01-20  200.0       
6934    2005-01-24  72160.0     
9240    2005-01-26  21061.0     
6374    2005-01-30  1000.0      
6143    2005-01-31  4989.4  

现在,我想将此新列total_trx_amount_received_in_last_6month添加到原始data中。因此,我应该通过列datadf合并这两个数据帧fromdate,但是日期的匹配条件是一个值的范围,而不是一个值。例如,对于帐户7468,如果原始data包含一笔交易7468并且交易日期落入"2004-07-08"-"2005-01-04"的间隔(从最近的6个月开始, "2005-01-04"中的值,然后应将700.0中的相应值df$total_trx_amount_received_in_last_6month添加到data$total_trx_amount_received_in_last_6month

我该怎么做?

1 个答案:

答案 0 :(得分:0)

没有足够的数据来对此进行测试,但是您可以将两个数据框和replace total_trx_amount_received_in_last_6month结合起来,其中两个日期之间的差值到NA大于180天。 / p>

library(dplyr)

data %>%
left_join(df, by = 'from') %>%
  mutate(total_trx_amount_received_in_last_6month = replace(
            total_trx_amount_received_in_last_6month, 
            (date.y - date.x) > 180, NA))

使用data.table,您可以执行以下操作:

library(data.table)
setDT(data)
df1 <- df[data, on = 'from']

df1[, total_trx_amount_received_in_last_6month := replace(
  total_trx_amount_received_in_last_6month, 
  (date - i.date) > 180, NA)]