我有一个如下所示的数据框
structure(list(ID = c(1, 1, 1, 1, 1,1,1), Start_Date = c("01-09-17",
"01-09-17", "08-09-17", "08-09-17", "08-09-17","15-09-17","15-09-17"), End_Date = c("07-09-17",
"07-09-17", "14-09-17", "14-09-17", "14-09-17","21-09-17","21-09-17"), Policy1_Date = c("05-09-17",
NA, "09-09-17", NA, "10-09-17","16-09-17","17-09-17"), Policy2_Date = c(NA, "06-09-17",
"08-09-17", "09-09-17", "10-09-17",NA,NA)), class = "data.frame", row.names = c(NA,
-5L))
我需要什么? -对于每个ID,我应该计算2个日期(对于每个保单)之间的差额。 Start_Date和End_Date描述了一周的开始和结束。 Policy1_Date和Policy2_Date显示交易日期及其所属的星期。对于每个星期的日期,我想通过以下方式找出日期之间的差异:
政策1 对于每个星期,例如从08-09-17到14-09-17,我将采用星期的开始日期(08-09-17),并取与前一个日期的差值(不在同一周内),即17年5月9日。 应该使用以前的非NA日期
政策2 对于每个星期,例如从08-09-17至14-09-17,我需要在08-09-17和06-09-17之间有所区别。
如果没有以前的日期,我将简单地输入NA。例如,从01-09-17到07-09-17,最近的交易是05-09-17,但是没有以前的交易。我将在本周标记为NA。
最终输出 这将有5列(ID,开始日期,结束日期,Policy1_gap,Policy2_gap)。
structure(list(ID = c(1, 1, 1), Start_Date = c("01-09-17", "08-09-17",
"15-09-17"), End_Date = c("07-09-17", "14-09-17", "21-09-17"),
Policy1_Gap = c(NA, 5, 7), Policy2_Gap = c(NA, 4, NA)), class = "data.frame", row.names = c(NA,
-3L))
注意:每个组/ ID都会进行整个汇总。我不会区分2个不同的组/ ID的日期之间的差异
答案 0 :(得分:1)
以下是使用data.table
的选项:
cols <- c("Policy1_Date", "Policy2_Date")
#convert columns into Date class
DT[, c("Start_Date", "End_Date", cols) := lapply(.SD, as.IDate, format="%d-%m-%y"),
.SDcols=c("Start_Date", "End_Date", cols)]
#for each ID, Start_Date and End_Date, find the last non-NA date for each column
DT[, lapply(.SD, function(x) last(x[!is.na(x)])), .(ID, Start_Date, End_Date), .SDcols=cols][,
#calculate the gap between dates
c("Policy1_Gap","Policy2_Gap") := lapply(.SD, function(x) c(NA_integer_, diff(x))), ID, .SDcols=cols][,
c("Policy1_Diff","Policy2_Diff") := lapply(.SD, function(x) Start_Date - shift(x)), ID, .SDcols=cols][]
输出:
ID Start_Date End_Date Policy1_Date Policy2_Date Policy1_Gap Policy2_Gap Policy1_Diff Policy2_Diff
1: 1 2017-09-01 2017-09-07 2017-09-05 2017-09-06 NA NA NA NA
2: 1 2017-09-08 2017-09-14 2017-09-10 2017-09-10 5 4 3 2
3: 1 2017-09-15 2017-09-21 2017-09-17 <NA> 7 NA 5 5
数据:
library(data.table)
DT <- setDT(structure(list(ID = c(1, 1, 1, 1, 1,1,1), Start_Date = c("01-09-17",
"01-09-17", "08-09-17", "08-09-17", "08-09-17","15-09-17","15-09-17"), End_Date = c("07-09-17",
"07-09-17", "14-09-17", "14-09-17", "14-09-17","21-09-17","21-09-17"), Policy1_Date = c("05-09-17",
NA, "09-09-17", NA, "10-09-17","16-09-17","17-09-17"), Policy2_Date = c(NA, "06-09-17",
"08-09-17", "09-09-17", "10-09-17",NA,NA)), class = "data.frame", row.names = c(NA,
-5L)))