我有一个类似于以下格式的数据集:
Account_ID Date Delinquency age count
1 01/01/2016 0 1 0
1 02/01/2016 1 2 0
1 03/01/2016 2 3 1
1 04/01/2016 0 4 2
1 05/01/2016 1 5 2
1 06/01/2016 2 6 2
2 01/01/2016 0 1 0
2 02/01/2016 0 2 0
2 03/01/2016 1 3 0
2 04/01/2016 0 4 1
2 05/01/2016 1 5 1
3 01/01/2016 1 1 0
3 02/01/2016 2 2 1
3 03/01/2016 3 3 2
3 04/01/2016 4 4 3
3 05/01/2016 5 5 4
3 06/01/2016 6 6 5
我想按帐户计算前3个月中非零的数量,即我想使用前4个变量(count
)创建Account_ID, Date, Delinquency, Age
变量。我想知道过去几个月 n 的操作方法。我希望我可以将此练习扩展到其他任务,例如在过去3个月中找到最大违法行为。
答案 0 :(得分:0)
欢迎来到SE!
如果您想按行计入前三个月的非零潮解事件,可以使用aggregate
函数以及zlag
包的TSA
函数以以下方式(请参见下面的代码)。由于您在count
列中提供的数据难以解释,并且很难与条件相关联,前提是要对示例中的数据进行仿真。
library(lubridate)
set.seed(123)
# data simulation
df <- data.frame( id = factor(rep(0:9, 100)),
date = sample(seq(ymd("2010-12-01"), by = 1, length.out = 1000), 1000, replace = TRUE),
deliquency = sample(c(rep(0, 30), 1:5), 1000, replace = TRUE),
age = sample(1:10, 1000, replace = TRUE))
head(df)
# id date deliquency age
# 1 0 2011-08-06 0 10
# 2 1 2013-08-16 0 6
# 3 2 2012-11-17 0 1
# 4 3 2012-09-12 0 9
# 5 4 2011-07-29 0 1
# 6 5 2011-02-25 0 9
# aggregation of non-zero deliquency by month
df$year_month <- df$date
day(df$year_month) <- 1
df_m <- aggregate(deliquency ~ id + year_month, data = df, sum)
df_m <- df_m[order(as.character(df_m$id, df_m$year_month)), ]
df_m$is_zero <- df_m$deliquency > 0
head(df_m)
# id year_month deliquency is_zero
# 1 0 2010-12-01 1 TRUE
# 10 0 2011-01-01 0 FALSE
# 19 0 2011-02-01 0 FALSE
# 29 0 2011-03-01 0 FALSE
# 39 0 2011-04-01 0 FALSE
# 65 0 2011-07-01 1 TRUE
# calculate zero-deliquency events for three last months
library(TSA)
dfx <- df_m
df_m_l <- by(df_m, df_m$id, function(dfx) {
dfx$zero_del <- zlag(dfx$is_zero, 1) + zlag(dfx$is_zero, 2) + zlag(dfx$is_zero, 3)
dfx})
df_m_res <- do.call(rbind, df_m_l)
head(df_m_res)
您可以看到data.frame
作为输出,该输出显示了最近3个月内的非零事件数量。例如。输出为:
id year_month deliquency is_zero zero_del
0.1 0 2010-12-01 1 TRUE NA
0.10 0 2011-01-01 0 FALSE NA
0.19 0 2011-02-01 0 FALSE NA
0.29 0 2011-03-01 0 FALSE 1
0.39 0 2011-04-01 0 FALSE 0
0.65 0 2011-07-01 1 TRUE 0