如何按帐户和日期引用先前的行

时间:2018-08-30 00:48:54

标签: r date

我有一个类似于以下格式的数据集:

Account_ID Date       Delinquency age count  
1          01/01/2016 0           1   0  
1          02/01/2016 1           2   0    
1          03/01/2016 2           3   1   
1          04/01/2016 0           4   2   
1          05/01/2016 1           5   2  
1          06/01/2016 2           6   2  
2          01/01/2016 0           1   0   
2          02/01/2016 0           2   0  
2          03/01/2016 1           3   0  
2          04/01/2016 0           4   1   
2          05/01/2016 1           5   1  
3          01/01/2016 1           1   0  
3          02/01/2016 2           2   1  
3          03/01/2016 3           3   2  
3          04/01/2016 4           4   3  
3          05/01/2016 5           5   4  
3          06/01/2016 6           6   5  

我想按帐户计算前3个月中非零的数量,即我想使用前4个变量(count)创建Account_ID, Date, Delinquency, Age变量。我想知道过去几个月 n 的操作方法。我希望我可以将此练习扩展到其他任务,例如在过去3个月中找到最大违法行为。

1 个答案:

答案 0 :(得分:0)

欢迎来到SE!

如果您想按行计入前三个月的非零潮解事件,可以使用aggregate函数以及zlag包的TSA函数以以下方式(请参见下面的代码)。由于您在count列中提供的数据难以解释,并且很难与条件相关联,前提是要对示例中的数据进行仿真。

library(lubridate)
set.seed(123)

# data simulation
df <- data.frame( id = factor(rep(0:9, 100)),
                  date = sample(seq(ymd("2010-12-01"), by = 1, length.out = 1000), 1000, replace = TRUE),
                 deliquency = sample(c(rep(0, 30), 1:5), 1000, replace = TRUE),
                 age = sample(1:10, 1000, replace = TRUE))

head(df)

# id       date deliquency age
# 1  0 2011-08-06          0  10
# 2  1 2013-08-16          0   6
# 3  2 2012-11-17          0   1
# 4  3 2012-09-12          0   9
# 5  4 2011-07-29          0   1
# 6  5 2011-02-25          0   9


# aggregation of non-zero deliquency by month
df$year_month <- df$date
day(df$year_month) <- 1
df_m <- aggregate(deliquency ~ id + year_month, data = df, sum)
df_m <- df_m[order(as.character(df_m$id, df_m$year_month)), ]
df_m$is_zero <- df_m$deliquency > 0

head(df_m)
# id year_month deliquency is_zero
# 1   0 2010-12-01          1    TRUE
# 10  0 2011-01-01          0   FALSE
# 19  0 2011-02-01          0   FALSE
# 29  0 2011-03-01          0   FALSE
# 39  0 2011-04-01          0   FALSE
# 65  0 2011-07-01          1    TRUE


# calculate zero-deliquency events for three last months
library(TSA)
dfx <- df_m
df_m_l <- by(df_m, df_m$id, function(dfx) {
    dfx$zero_del <- zlag(dfx$is_zero, 1) + zlag(dfx$is_zero, 2) + zlag(dfx$is_zero, 3) 
    dfx})

df_m_res <- do.call(rbind, df_m_l)
head(df_m_res)

您可以看到data.frame作为输出,该输出显示了最近3个月内的非零事件数量。例如。输出为:

     id year_month deliquency is_zero zero_del
0.1   0 2010-12-01          1    TRUE       NA
0.10  0 2011-01-01          0   FALSE       NA
0.19  0 2011-02-01          0   FALSE       NA
0.29  0 2011-03-01          0   FALSE        1
0.39  0 2011-04-01          0   FALSE        0
0.65  0 2011-07-01          1    TRUE        0