上述代码计算了队列保留率。该队列是加入的月份。因此,该代码计算了2015年5月加入的客户数量,月份活跃的数量。最终输出存储在数据帧df1(如下所示)
中我需要帮助创建当前在ddply函数中进行硬编码的动态列名。 M0表示加入月份,M1表示加入第1个月,M2表示加入M(n)2个月应为变量。这可以通过从最早的加入日期中减去最远的到期日来计算。
不幸的是,我无法动态自动计算M0到M(n)范围。
这是我的代码转储有效但不是最佳的,因为我已将M0硬编码为M3作为ddply函数中的变量。因此,如果我的输入数据的客户订阅期超过5个月,我的代码就会失败。
代码的输入是以下虚拟数据。
customer dj exp
abc 01/05/15 25/06/15
efg 01/05/15 25/07/15
ghd 01/05/15 25/07/15
mkd 01/06/15 25/07/15
kskm 01/06/15 05/08/15
可重复的代码。
library(zoo)
library(plyr)
customer<-c("abc","efg","ghd","mkd","kskm")
dj<-c("2015-05-01", "2015-05-01", "2015-05-01","2015-06-01","2015-06-01")
exp<-c("2015-06-25", "2015-07-25", "2015-07-25","2015-07-01","2015-08-05")
data.frame(customer,dj,exp)
df$dj <- as.Date(df$dj,"%d/%m/%y")
df$exp <- as.Date(df$exp,"%d/%m/%y")
# The data in the file has different variable names than your example data
# so I'm changing them to match
names(df)[1:3] <- c("customer","dj","exp")
# Make a variable called Cohort that contains only the year and month of joining
# as.yearmon() comes from the 'zoo' package
df$Cohort <- as.yearmon(df$dj)
# Calculate the difference in months between date of expiry and date of joining
df$MonthDiff <- ceiling((df$exp-df$dj)/30)
#df$MonthDiff <- 12*(as.yearmon(df$exp+months(1))-df$Cohort)
range<-as.integer(ceiling((max(df$exp)-min(df$dj)))/30)
# Use ddply() from the 'plyr' package to get the frequency of subjects that are
# still active after 0, 1, 2, and 3 months.
df1 <- ddply(df,.(Cohort),summarize,
M0 = sum(MonthDiff > 0),
M1 = sum(MonthDiff > 1),
M2 = sum(MonthDiff > 2),
M3 = sum(MonthDiff > 3)
)
df1
df1
Cohort M0 M1 M2 M3
1 May 2015 3 3 2 0
2 Jun 2015 2 2 1 0
以上是输出工作输出。要求是将M0列设为M3动态
答案 0 :(得分:0)
尝试在创建range
后插入此内容:
for(i in 0:range) df <- within(df,assign(paste0("M",i),MonthDiff>i))
df1 <- ddply(df,.(Cohort),function(x) colSums(x[,paste0("M",0:range)]))
df1
# Cohort M0 M1 M2 M3
# 1 May 2015 3 3 2 0
# 2 Jun 2015 2 1 1 0