通过ID合并R中另一个数据集的最近日期

时间:2017-10-30 16:50:54

标签: r date merge dplyr

我有两个数据集:

df1 = data.frame(id = c("A","A","B","B","B","C","C"), 
               date.lastused = c("29/05/2010", "23/08/2014","23/08/2011", "18/04/2014","25/09/2015", "18/11/2013","04/01/2013"))

df2 = data.frame(id = c("A","A","A","A","B","B","B","B","B","B","C","C","C"),
               sample.date = c("21/02/2013", "03/05/2014", "19/07/2016",    "31/07/2013",   "07/10/2011",   "16/01/2012",   "10/07/2014","20/09/2015",  "29/11/2016",       "15/08/2014",   "27/09/2011",   "27/01/2012",   "09/03/2014"),
               tcc = c(126,109,69,111,14,13.8,14.1,14,  14.4,143,102,114,116))

我想通过ID将df2中从df1到sample.date的最接近的datelast带到df2。最终数据集应如下所示:

> finaldt
   id       date price date.lastused
1   A 21/02/2013 126.0    29/05/2010
2   A 03/05/2014 109.0    29/05/2010
3   A 19/07/2016  69.0    23/08/2014
4   A 31/07/2013 111.0    23/08/2014
5   B 07/10/2011  14.0    23/08/2011
6   B 16/01/2012  13.8    23/08/2011
7   B 10/07/2014  14.1    18/04/2014
8   B 20/09/2015  14.0    18/04/2014
9   B 29/11/2016  14.4    25/09/2015
10  B 15/08/2014 143.0    18/04/2014
11  C 27/09/2011 102.0            NA
12  C 27/01/2012 114.0            NA
13  C 09/03/2014 116.0    18/11/2013

有没有人有任何想法?

1 个答案:

答案 0 :(得分:3)

您可以使用data.table滚动加入:

library(data.table)
setDT(df1); setDT(df2);

df1[, date.lastused := as.Date(date.lastused, '%d/%m/%Y')]
df2[, sample.date := as.Date(sample.date, '%d/%m/%Y')]

df1[
    df2, 
    # extract id sample.date, tcc from df2 with prefix of i, 
    # date.lastused from df1 with prefix of x
    .(id = i.id, date = i.sample.date, price = i.tcc, date.lastused = x.date.lastused), 
    on = .(id, date.lastused = sample.date),          # join on id and dates columns
    roll = Inf
]

#    id       date price   date.lastused
# 1:  A 2013-02-21 126.0      2010-05-29
# 2:  A 2014-05-03 109.0      2010-05-29
# 3:  A 2016-07-19  69.0      2014-08-23
# 4:  A 2013-07-31 111.0      2010-05-29
# 5:  B 2011-10-07  14.0      2011-08-23
# 6:  B 2012-01-16  13.8      2011-08-23
# 7:  B 2014-07-10  14.1      2014-04-18
# 8:  B 2015-09-20  14.0      2014-04-18
# 9:  B 2016-11-29  14.4      2015-09-25
#10:  B 2014-08-15 143.0      2014-04-18
#11:  C 2011-09-27 102.0            <NA>
#12:  C 2012-01-27 114.0            <NA>
#13:  C 2014-03-09 116.0      2013-11-18