算不了比赛结束后每个ID的日期

时间:2018-04-25 08:01:36

标签: r sorting date count match

我想在df1中创建一个列(res),列出每个ID的计数(将是降序)。此列将匹配日期B和日期C计算C(来自df2)的日期大于dateA的日期。每个ID的日期C中的日期将多于日期B.

DF1

ID  date A      date B
17  27/06/12    26/07/12
17  21/02/13    21/02/13
17  23/01/14    23/01/14
17  5/02/15     5/02/15
17  28/11/16    16/06/16 
18  25/07/13    22/05/13
18  29/10/14    1/12/14
18  11/05/15    1/12/14
21  27/09/12    16/07/12
21  25/07/14    11/08/14
21  15/07/15    24/02/15

DF2

ID  date C
17  09/02/12
17  26/07/12
17  21/02/13
17  23/01/14
17  19/06/14
17  24/07/14
17  5/02/15
17  26/02/15
17  28/05/15
17  20/08/15
17  24/03/16
17  16/06/16
18  22/05/13
18  16/10/13
18  5/05/14
18  1/12/14
21  16/07/12
21  27/05/13
21  10/02/14
21  11/08/14
21  24/02/15

添加了新列的df1:

df1
ID  date A      date B     res
17  27/06/12    26/07/12    11
17  21/02/13    21/02/13    9
17  23/01/14    23/01/14    8
17  5/02/15     5/02/15     5
17  28/11/16    16/06/16    0 
18  25/07/13    22/05/13    3
18  29/10/14    1/12/14     1
18  11/05/15    1/12/14     0
21  27/09/12    16/07/12    4
21  25/07/14    11/08/14    2
21  15/07/15    24/02/15    0

1 个答案:

答案 0 :(得分:2)

data.table包非常适合这种非equi连接。

df1[, res:=0L][match(df2$dateC, dateB), 
    res := df2[.SD, on=.(ID, dateC > dateA), .N, by=.EACHI]$N]
df1

在上面的代码中,将结果初始化为0.

然后,您可以通过匹配df1dateB之间的值来对dateC进行分组。

然后,您使用df1df2IDdateC > dateA加入。

对于df1的每一行(即.EACHI),返回行数。

由于结果是data.table,$N将在计算后返回名为N的列。

或者,使用equi-join

df1[, res:=0L][df2, on=.(ID, dateB=dateC), 
    res := df2[.SD, on=.(ID, dateC > dateA), .N, by=.EACHI]$N][]

数据:

library(data.table)

df1 <- fread("ID  dateA      dateB
17  27/06/12    26/07/12
17  21/02/13    21/02/13
17  23/01/14    23/01/14
17  5/02/15     5/02/15
17  28/11/16    16/06/16 
18  25/07/13    22/05/13
18  29/10/14    1/12/14
18  11/05/15    1/12/14
21  27/09/12    16/07/12
21  25/07/14    11/08/14
21  15/07/15    24/02/15")
cols <- c("dateA", "dateB")
df1[, (cols) := lapply(.SD, as.Date, format="%d/%m/%y"), .SDcols=cols]

df2 <- fread("ID  dateC
17  09/02/12
17  26/07/12
17  21/02/13
17  23/01/14
17  19/06/14
17  24/07/14
17  5/02/15
17  26/02/15
17  28/05/15
17  20/08/15
17  24/03/16
17  16/06/16
18  22/05/13
18  16/10/13
18  5/05/14
18  1/12/14
21  16/07/12
21  27/05/13
21  10/02/14
21  11/08/14
21  24/02/15")
df2[, dateC := as.Date(dateC, "%d/%m/%y")]