R中的队列数据转换

时间:2015-12-22 09:22:56

标签: r

我的数据框df包含客户名称,加入日期,到期日期和同期群组。

names         dj        exp  cohort
  (fctr)     (date)     (date)   (chr)
1    Tom 2011-05-01 2011-06-22 2011-05
2  David 2011-06-01 2011-07-19 2011-06
3   Jack 2011-05-03 2012-01-03 2011-05
>

names<-c("Tom","David","Jack")
dj<-as.Date(c("2011-05-01","2011-06-01","2011-05-03"))
exp<-as.Date(c("2011-06-22","2011-07-19","2012-01-03"))
df<-data.frame(names,dj,exp)
df$cohort<-format(df$dj,"%Y-%m")
tbl_df(df)

载体DateColumns <- seq.Date(as.Date("2011/05/01"), as.Date("2015/12/1"), by = "1 month")的日历日期为2011年5月1日至2015年12月1日

我想从中检查客户是否在特定日历月内处于活动状态。 Active被定义为客户的exp&gt; DateColumns&amp; DJ&LT = DateColumns

输出1(正确):

names dj  exp   cohort               2011-05-01 2011-06-01 2011-07-01 2011-08-01 .

Tom 2011-05-01 2011-06-22 2011-05    TRUE       TRUE       FALSE      FALSE
David 2011-06-01 2011-07-19 2011-06  FALSE      TRUE       TRUE       FALSE
Jack 2011-05-03 2012-01-03 2011-05   TRUE       TRUE       TRUE       TRUE  ....

以下是我编写的代码,遗憾的是,这无法将到期日期和dj与日期列中的日历日期进行比较。例如,大卫在X1中应该是假的。那么,我该怎么做?

输出错误

  names<-c("Tom","David","Jack")
    dj<-as.Date(c("2011-05-01","2011-06-01","2011-05-03"))
    exp<-as.Date(c("2011-06-22","2011-07-19","2012-01-03"))
    df<-data.frame(names,dj,exp)
    df$cohort<-format(df$dj,"%Y-%m")


    DateColumns <- seq.Date(as.Date("2011/05/01"), as.Date("2015/12/1"), by = "1 month")

DateColumnvalues <- t(sapply(df$exp, function(x) x > DateColumns))
df2 <- data.frame(df,DateColumnvalues)
tbl_df(df2)

output:
names         dj        exp  cohort    X1    X2    X3    X4    X5    X6
  (fctr)     (date)     (date)   (chr) (lgl) (lgl) (lgl) (lgl) (lgl) (lgl)
1    Tom 2011-05-01 2011-06-22 2011-05  TRUE  TRUE FALSE FALSE FALSE FALSE
2  David 2011-06-01 2011-07-19 2011-06  **TRUE**  TRUE  TRUE FALSE FALSE FALSE
3   Jack 2011-05-03 2012-01-03 2011-05  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
Variables not shown: X7 (lgl), X8 (lgl), X9 (lgl), X10 (lgl), X11 (lgl),
  X12 (lgl), X13 (lgl), X14 (lgl), X15 (lgl), X16 (lgl), X17 (lgl), X18
  (lgl), X19 (lgl), X20 (lgl), X21 (lgl), X22 (lgl), X23 (lgl), X24 (lgl),
  X25 (lgl), X26 (lgl), X27 (lgl), X28 (lgl), X29 (lgl), X30 (lgl), X31
  (lgl), X32 (lgl), X33 (lgl), X34 (lgl), X35 (lgl), X36 (lgl), X37 (lgl),
  X38 (lgl), X39 (lgl), X40 (lgl), X41 (lgl), X42 (lgl), X43 (lgl), X44
  (lgl), X45 (lgl), X46 (lgl), X47 (lgl), X48 (lgl), X49 (lgl), X50 (lgl),
  X51 (lgl), X52 (lgl), X53 (lgl), X54 (lgl), X55 (lgl), X56 (lgl)
> 

注意:X1是&#34; 2011-05-01&#34;和X2 =&#34; 2011-06-01&#34;所以表示日历月

其次,我想转换这些数据&#34;相对&#34;通过基于加入和同期的月份汇总。例如,如果客户&#34;迪克&#34;在2015年1月加入并将于2015年12月到期,他的M0应设置为true,但M0应考虑到2015年1月,而不是日历月。

Names dj exp cohort                     M0 M1 M2 M3  till M55
Dick  2015-01-11 2015-12-10 2015-01     T  T  T  T
Tom   2011-05-01 2011-06-22 2011-05     T  T  F  F
David 2011-06-01 2011-07-19 2011-06     T  T  F  F

1 个答案:

答案 0 :(得分:1)

作为第一个问题的答案,你可以做到

library(data.table)
library(lubridate)
dt <- data.table(df, key=c("dj", "exp"))
dates <- setDT(transform(data.frame(start = seq.Date(as.Date("2011-05-01"), as.Date("2011-08-01"), "1 month")), 
                                    end = start + months(1) - 1), 
               key = c("start", "end"))
dcast(foverlaps(dt, dates)[, val:=TRUE], names+dj+exp+cohort~start, value.var="val", fill=FALSE)
#    names         dj        exp  cohort 2011-05-01 2011-06-01 2011-07-01 2011-08-01
# 1: David 2011-06-01 2011-07-19 2011-06      FALSE       TRUE       TRUE      FALSE
# 2:  Jack 2011-05-03 2012-01-03 2011-05       TRUE       TRUE       TRUE       TRUE
# 3:   Tom 2011-05-01 2011-06-22 2011-05       TRUE       TRUE      FALSE      FALSE

关于第二个问题,如果我理解正确,我会选择

lst <- apply(df[2:3], 1, function(x) { x <- as.Date(x); as.logical(seq_along(seq(x[1], x[2], by="month")))  }) 
n <- max(lengths(lst)) 
res <- cbind(df, do.call(rbind, lapply(lst, function(x) `length<-`(x, n) ))) 
res[is.na(res)] <- FALSE; res
#   names         dj        exp  cohort    1    2     3     4     5     6     7     8     9
# 1   Tom 2011-05-01 2011-06-22 2011-05 TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
# 2 David 2011-06-01 2011-07-19 2011-06 TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
# 3  Jack 2011-05-03 2012-01-03 2011-05 TRUE TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE