我想在df1中创建一个列(res),列出每个ID的计数(将是降序)。此列将匹配日期B和日期C计算C(来自df2)的日期大于dateA的日期。每个ID的日期C中的日期将多于日期B.
DF1
ID date A date B
17 27/06/12 26/07/12
17 21/02/13 21/02/13
17 23/01/14 23/01/14
17 5/02/15 5/02/15
17 28/11/16 16/06/16
18 25/07/13 22/05/13
18 29/10/14 1/12/14
18 11/05/15 1/12/14
21 27/09/12 16/07/12
21 25/07/14 11/08/14
21 15/07/15 24/02/15
DF2
ID date C
17 09/02/12
17 26/07/12
17 21/02/13
17 23/01/14
17 19/06/14
17 24/07/14
17 5/02/15
17 26/02/15
17 28/05/15
17 20/08/15
17 24/03/16
17 16/06/16
18 22/05/13
18 16/10/13
18 5/05/14
18 1/12/14
21 16/07/12
21 27/05/13
21 10/02/14
21 11/08/14
21 24/02/15
添加了新列的df1:
df1
ID date A date B res
17 27/06/12 26/07/12 11
17 21/02/13 21/02/13 9
17 23/01/14 23/01/14 8
17 5/02/15 5/02/15 5
17 28/11/16 16/06/16 0
18 25/07/13 22/05/13 3
18 29/10/14 1/12/14 1
18 11/05/15 1/12/14 0
21 27/09/12 16/07/12 4
21 25/07/14 11/08/14 2
21 15/07/15 24/02/15 0
答案 0 :(得分:2)
data.table
包非常适合这种非equi连接。
df1[, res:=0L][match(df2$dateC, dateB),
res := df2[.SD, on=.(ID, dateC > dateA), .N, by=.EACHI]$N]
df1
在上面的代码中,将结果初始化为0.
然后,您可以通过匹配df1
和dateB
之间的值来对dateC
进行分组。
然后,您使用df1
和df2
将ID
与dateC > dateA
加入。
对于df1
的每一行(即.EACHI
),返回行数。
由于结果是data.table,$N
将在计算后返回名为N的列。
或者,使用equi-join
df1[, res:=0L][df2, on=.(ID, dateB=dateC),
res := df2[.SD, on=.(ID, dateC > dateA), .N, by=.EACHI]$N][]
数据:
library(data.table)
df1 <- fread("ID dateA dateB
17 27/06/12 26/07/12
17 21/02/13 21/02/13
17 23/01/14 23/01/14
17 5/02/15 5/02/15
17 28/11/16 16/06/16
18 25/07/13 22/05/13
18 29/10/14 1/12/14
18 11/05/15 1/12/14
21 27/09/12 16/07/12
21 25/07/14 11/08/14
21 15/07/15 24/02/15")
cols <- c("dateA", "dateB")
df1[, (cols) := lapply(.SD, as.Date, format="%d/%m/%y"), .SDcols=cols]
df2 <- fread("ID dateC
17 09/02/12
17 26/07/12
17 21/02/13
17 23/01/14
17 19/06/14
17 24/07/14
17 5/02/15
17 26/02/15
17 28/05/15
17 20/08/15
17 24/03/16
17 16/06/16
18 22/05/13
18 16/10/13
18 5/05/14
18 1/12/14
21 16/07/12
21 27/05/13
21 10/02/14
21 11/08/14
21 24/02/15")
df2[, dateC := as.Date(dateC, "%d/%m/%y")]