如何在r中创建列动态

时间:2015-12-02 11:51:27

标签: r variables calculated-columns

上述代码计算了队列保留率。该队列是加入的月份。因此,该代码计算了2015年5月加入的客户数量,月份活跃的数量。最终输出存储在数据帧df1(如下所示)

我需要帮助创建当前在ddply函数中进行硬编码的动态列名。 M0表示加入月份,M1表示加入第1个月,M2表示加入M(n)2个月应为变量。这可以通过从最早的加入日期中减去最远的到期日来计算。

不幸的是,我无法动态自动计算M0到M(n)范围。

这是我的代码转储有效但不是最佳的,因为我已将M0硬编码为M3作为ddply函数中的变量。因此,如果我的输入数据的客户订阅期超过5个月,我的代码就会失败。

代码的输入是以下虚拟数据。

customer    dj       exp
abc      01/05/15   25/06/15
efg      01/05/15   25/07/15
ghd      01/05/15   25/07/15
mkd      01/06/15   25/07/15
kskm     01/06/15   05/08/15

可重复的代码。

    library(zoo)
    library(plyr)

    customer<-c("abc","efg","ghd","mkd","kskm")
    dj<-c("2015-05-01", "2015-05-01", "2015-05-01","2015-06-01","2015-06-01")
    exp<-c("2015-06-25", "2015-07-25", "2015-07-25","2015-07-01","2015-08-05")
    data.frame(customer,dj,exp)
    df$dj <- as.Date(df$dj,"%d/%m/%y")
    df$exp <- as.Date(df$exp,"%d/%m/%y")

    # The data in the file has different variable names than your example data
    # so I'm changing them to match
    names(df)[1:3] <- c("customer","dj","exp")

    # Make a variable called Cohort that contains only the year and month of joining
    # as.yearmon() comes from the 'zoo' package
    df$Cohort <- as.yearmon(df$dj)

    # Calculate the difference in months between date of expiry and date of joining
    df$MonthDiff <- ceiling((df$exp-df$dj)/30)
    #df$MonthDiff <- 12*(as.yearmon(df$exp+months(1))-df$Cohort)

    range<-as.integer(ceiling((max(df$exp)-min(df$dj)))/30)

    # Use ddply() from the 'plyr' package to get the frequency of subjects that are
    # still active after 0, 1, 2, and 3 months.

    df1 <- ddply(df,.(Cohort),summarize,
                 M0 = sum(MonthDiff > 0), 
                 M1 = sum(MonthDiff > 1),
                 M2 = sum(MonthDiff > 2),
                 M3 = sum(MonthDiff > 3)

    )

 df1


df1
    Cohort M0 M1 M2 M3 
1 May 2015  3  3  2  0  
2 Jun 2015  2  2  1  0 

以上是输出工作输出。要求是将M0列设为M3动态

1 个答案:

答案 0 :(得分:0)

尝试在创建range后插入此内容:

for(i in 0:range) df <- within(df,assign(paste0("M",i),MonthDiff>i))

df1 <- ddply(df,.(Cohort),function(x) colSums(x[,paste0("M",0:range)]))

df1
#     Cohort M0 M1 M2 M3
# 1 May 2015  3  3  2  0
# 2 Jun 2015  2  1  1  0