我的数据框df包含客户名称,加入日期,到期日期和同期群组。
names dj exp cohort
(fctr) (date) (date) (chr)
1 Tom 2011-05-01 2011-06-22 2011-05
2 David 2011-06-01 2011-07-19 2011-06
3 Jack 2011-05-03 2012-01-03 2011-05
>
names<-c("Tom","David","Jack")
dj<-as.Date(c("2011-05-01","2011-06-01","2011-05-03"))
exp<-as.Date(c("2011-06-22","2011-07-19","2012-01-03"))
df<-data.frame(names,dj,exp)
df$cohort<-format(df$dj,"%Y-%m")
tbl_df(df)
载体DateColumns <- seq.Date(as.Date("2011/05/01"), as.Date("2015/12/1"), by = "1 month")
的日历日期为2011年5月1日至2015年12月1日
我想从中检查客户是否在特定日历月内处于活动状态。 Active被定义为客户的exp&gt; DateColumns&amp; DJ&LT = DateColumns
输出1(正确):
names dj exp cohort 2011-05-01 2011-06-01 2011-07-01 2011-08-01 .
Tom 2011-05-01 2011-06-22 2011-05 TRUE TRUE FALSE FALSE
David 2011-06-01 2011-07-19 2011-06 FALSE TRUE TRUE FALSE
Jack 2011-05-03 2012-01-03 2011-05 TRUE TRUE TRUE TRUE ....
以下是我编写的代码,遗憾的是,这无法将到期日期和dj与日期列中的日历日期进行比较。例如,大卫在X1中应该是假的。那么,我该怎么做?
输出错误
names<-c("Tom","David","Jack")
dj<-as.Date(c("2011-05-01","2011-06-01","2011-05-03"))
exp<-as.Date(c("2011-06-22","2011-07-19","2012-01-03"))
df<-data.frame(names,dj,exp)
df$cohort<-format(df$dj,"%Y-%m")
DateColumns <- seq.Date(as.Date("2011/05/01"), as.Date("2015/12/1"), by = "1 month")
DateColumnvalues <- t(sapply(df$exp, function(x) x > DateColumns))
df2 <- data.frame(df,DateColumnvalues)
tbl_df(df2)
output:
names dj exp cohort X1 X2 X3 X4 X5 X6
(fctr) (date) (date) (chr) (lgl) (lgl) (lgl) (lgl) (lgl) (lgl)
1 Tom 2011-05-01 2011-06-22 2011-05 TRUE TRUE FALSE FALSE FALSE FALSE
2 David 2011-06-01 2011-07-19 2011-06 **TRUE** TRUE TRUE FALSE FALSE FALSE
3 Jack 2011-05-03 2012-01-03 2011-05 TRUE TRUE TRUE TRUE TRUE TRUE
Variables not shown: X7 (lgl), X8 (lgl), X9 (lgl), X10 (lgl), X11 (lgl),
X12 (lgl), X13 (lgl), X14 (lgl), X15 (lgl), X16 (lgl), X17 (lgl), X18
(lgl), X19 (lgl), X20 (lgl), X21 (lgl), X22 (lgl), X23 (lgl), X24 (lgl),
X25 (lgl), X26 (lgl), X27 (lgl), X28 (lgl), X29 (lgl), X30 (lgl), X31
(lgl), X32 (lgl), X33 (lgl), X34 (lgl), X35 (lgl), X36 (lgl), X37 (lgl),
X38 (lgl), X39 (lgl), X40 (lgl), X41 (lgl), X42 (lgl), X43 (lgl), X44
(lgl), X45 (lgl), X46 (lgl), X47 (lgl), X48 (lgl), X49 (lgl), X50 (lgl),
X51 (lgl), X52 (lgl), X53 (lgl), X54 (lgl), X55 (lgl), X56 (lgl)
>
注意:X1是&#34; 2011-05-01&#34;和X2 =&#34; 2011-06-01&#34;所以表示日历月
其次,我想转换这些数据&#34;相对&#34;通过基于加入和同期的月份汇总。例如,如果客户&#34;迪克&#34;在2015年1月加入并将于2015年12月到期,他的M0应设置为true,但M0应考虑到2015年1月,而不是日历月。
Names dj exp cohort M0 M1 M2 M3 till M55
Dick 2015-01-11 2015-12-10 2015-01 T T T T
Tom 2011-05-01 2011-06-22 2011-05 T T F F
David 2011-06-01 2011-07-19 2011-06 T T F F
答案 0 :(得分:1)
作为第一个问题的答案,你可以做到
library(data.table)
library(lubridate)
dt <- data.table(df, key=c("dj", "exp"))
dates <- setDT(transform(data.frame(start = seq.Date(as.Date("2011-05-01"), as.Date("2011-08-01"), "1 month")),
end = start + months(1) - 1),
key = c("start", "end"))
dcast(foverlaps(dt, dates)[, val:=TRUE], names+dj+exp+cohort~start, value.var="val", fill=FALSE)
# names dj exp cohort 2011-05-01 2011-06-01 2011-07-01 2011-08-01
# 1: David 2011-06-01 2011-07-19 2011-06 FALSE TRUE TRUE FALSE
# 2: Jack 2011-05-03 2012-01-03 2011-05 TRUE TRUE TRUE TRUE
# 3: Tom 2011-05-01 2011-06-22 2011-05 TRUE TRUE FALSE FALSE
关于第二个问题,如果我理解正确,我会选择
lst <- apply(df[2:3], 1, function(x) { x <- as.Date(x); as.logical(seq_along(seq(x[1], x[2], by="month"))) })
n <- max(lengths(lst))
res <- cbind(df, do.call(rbind, lapply(lst, function(x) `length<-`(x, n) )))
res[is.na(res)] <- FALSE; res
# names dj exp cohort 1 2 3 4 5 6 7 8 9
# 1 Tom 2011-05-01 2011-06-22 2011-05 TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
# 2 David 2011-06-01 2011-07-19 2011-06 TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
# 3 Jack 2011-05-03 2012-01-03 2011-05 TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE