在R中按ID分组查找2个不同星期中的2个最近交易之间的差异

时间:2019-12-27 07:53:23

标签: r

我有一个如下所示的数据框

structure(list(ID = c(1, 1, 1, 1, 1,1,1), Start_Date = c("01-09-17", 
"01-09-17", "08-09-17", "08-09-17", "08-09-17","15-09-17","15-09-17"), End_Date = c("07-09-17", 
"07-09-17", "14-09-17", "14-09-17", "14-09-17","21-09-17","21-09-17"), Policy1_Date = c("05-09-17", 
NA, "09-09-17", NA, "10-09-17","16-09-17","17-09-17"), Policy2_Date = c(NA, "06-09-17", 
"08-09-17", "09-09-17", "10-09-17",NA,NA)), class = "data.frame", row.names = c(NA, 
-5L))

我需要什么? -对于每个ID,我应该计算2个日期(对于每个保单)之间的差额。 Start_Date和End_Date描述了一周的开始和结束。 Policy1_Date和Policy2_Date显示交易日期及其所属的星期。对于每个星期的日期,我想通过以下方式找出日期之间的差异:

政策1 对于每个星期,例如从08-09-17到14-09-17,我将采用星期的开始日期(08-09-17),并取与前一个日期的差值(不在同一周内),即17年5月9日。 应该使用以前的非NA日期

政策2 对于每个星期,例如从08-09-17至14-09-17,我需要在08-09-17和06-09-17之间有所区别。

如果没有以前的日期,我将简单地输入NA。例如,从01-09-17到07-09-17,最近的交易是05-09-17,但是没有以前的交易。我将在本周标记为NA。

最终输出 这将有5列(ID,开始日期,结束日期,Policy1_gap,Policy2_gap)。

structure(list(ID = c(1, 1, 1), Start_Date = c("01-09-17", "08-09-17", 
"15-09-17"), End_Date = c("07-09-17", "14-09-17", "21-09-17"), 
    Policy1_Gap = c(NA, 5, 7), Policy2_Gap = c(NA, 4, NA)), class = "data.frame", row.names = c(NA, 
-3L))

注意:每个组/ ID都会进行整个汇总。我不会区分2个不同的组/ ID的日期之间的差异

1 个答案:

答案 0 :(得分:1)

以下是使用data.table的选项:

cols <- c("Policy1_Date", "Policy2_Date")
#convert columns into Date class
DT[, c("Start_Date", "End_Date", cols) := lapply(.SD, as.IDate, format="%d-%m-%y"), 
    .SDcols=c("Start_Date", "End_Date", cols)]

#for each ID, Start_Date and End_Date, find the last non-NA date for each column
DT[, lapply(.SD, function(x) last(x[!is.na(x)])), .(ID, Start_Date, End_Date), .SDcols=cols][,
    #calculate the gap between dates
    c("Policy1_Gap","Policy2_Gap") := lapply(.SD, function(x) c(NA_integer_, diff(x))), ID, .SDcols=cols][,
        c("Policy1_Diff","Policy2_Diff") := lapply(.SD, function(x) Start_Date - shift(x)), ID, .SDcols=cols][]

输出:

   ID Start_Date   End_Date Policy1_Date Policy2_Date Policy1_Gap Policy2_Gap Policy1_Diff Policy2_Diff
1:  1 2017-09-01 2017-09-07   2017-09-05   2017-09-06          NA          NA           NA           NA
2:  1 2017-09-08 2017-09-14   2017-09-10   2017-09-10           5           4            3            2
3:  1 2017-09-15 2017-09-21   2017-09-17         <NA>           7          NA            5            5

数据:

library(data.table)
DT <- setDT(structure(list(ID = c(1, 1, 1, 1, 1,1,1), Start_Date = c("01-09-17", 
    "01-09-17", "08-09-17", "08-09-17", "08-09-17","15-09-17","15-09-17"), End_Date = c("07-09-17", 
        "07-09-17", "14-09-17", "14-09-17", "14-09-17","21-09-17","21-09-17"), Policy1_Date = c("05-09-17", 
            NA, "09-09-17", NA, "10-09-17","16-09-17","17-09-17"), Policy2_Date = c(NA, "06-09-17", 
                "08-09-17", "09-09-17", "10-09-17",NA,NA)), class = "data.frame", row.names = c(NA, 
                    -5L)))