我有两个数据集:
df1 = data.frame(id = c("A","A","B","B","B","C","C"),
date.lastused = c("29/05/2010", "23/08/2014","23/08/2011", "18/04/2014","25/09/2015", "18/11/2013","04/01/2013"))
df2 = data.frame(id = c("A","A","A","A","B","B","B","B","B","B","C","C","C"),
sample.date = c("21/02/2013", "03/05/2014", "19/07/2016", "31/07/2013", "07/10/2011", "16/01/2012", "10/07/2014","20/09/2015", "29/11/2016", "15/08/2014", "27/09/2011", "27/01/2012", "09/03/2014"),
tcc = c(126,109,69,111,14,13.8,14.1,14, 14.4,143,102,114,116))
我想通过ID将df2中从df1到sample.date的最接近的datelast带到df2。最终数据集应如下所示:
> finaldt
id date price date.lastused
1 A 21/02/2013 126.0 29/05/2010
2 A 03/05/2014 109.0 29/05/2010
3 A 19/07/2016 69.0 23/08/2014
4 A 31/07/2013 111.0 23/08/2014
5 B 07/10/2011 14.0 23/08/2011
6 B 16/01/2012 13.8 23/08/2011
7 B 10/07/2014 14.1 18/04/2014
8 B 20/09/2015 14.0 18/04/2014
9 B 29/11/2016 14.4 25/09/2015
10 B 15/08/2014 143.0 18/04/2014
11 C 27/09/2011 102.0 NA
12 C 27/01/2012 114.0 NA
13 C 09/03/2014 116.0 18/11/2013
有没有人有任何想法?
答案 0 :(得分:3)
您可以使用data.table
滚动加入:
library(data.table)
setDT(df1); setDT(df2);
df1[, date.lastused := as.Date(date.lastused, '%d/%m/%Y')]
df2[, sample.date := as.Date(sample.date, '%d/%m/%Y')]
df1[
df2,
# extract id sample.date, tcc from df2 with prefix of i,
# date.lastused from df1 with prefix of x
.(id = i.id, date = i.sample.date, price = i.tcc, date.lastused = x.date.lastused),
on = .(id, date.lastused = sample.date), # join on id and dates columns
roll = Inf
]
# id date price date.lastused
# 1: A 2013-02-21 126.0 2010-05-29
# 2: A 2014-05-03 109.0 2010-05-29
# 3: A 2016-07-19 69.0 2014-08-23
# 4: A 2013-07-31 111.0 2010-05-29
# 5: B 2011-10-07 14.0 2011-08-23
# 6: B 2012-01-16 13.8 2011-08-23
# 7: B 2014-07-10 14.1 2014-04-18
# 8: B 2015-09-20 14.0 2014-04-18
# 9: B 2016-11-29 14.4 2015-09-25
#10: B 2014-08-15 143.0 2014-04-18
#11: C 2011-09-27 102.0 <NA>
#12: C 2012-01-27 114.0 <NA>
#13: C 2014-03-09 116.0 2013-11-18