假设我有以下数据集:
structure(list(AccountNumber = 1:5, ActivationDate = c(201001L,
201002L, 201001L, 201010L, 201008L), Payments_201001 = c(100L,
NA, 2342L, NA, NA), Payments_201002 = c(200L, 100L, 235L, NA,
NA), Payments_201003 = c(100L, 100L, 111L, NA, NA), Payments_201004 = c(100L,
100L, 144L, NA, NA), Payments_201005 = c(150L, 100L, NA, NA,
NA), Payments_201006 = c(150L, 100L, NA, NA, NA), Payments_201007 = c(NA,
100L, NA, NA, NA), Payments_201008 = c(NA, 100L, NA, NA, 144L
), Payments_201009 = c(NA, NA, NA, NA, 159L), Payments_201010 = c(NA,
NA, NA, 100L, 100L)), .Names = c("AccountNumber", "ActivationDate",
"Payments_201001", "Payments_201002", "Payments_201003", "Payments_201004",
"Payments_201005", "Payments_201006", "Payments_201007", "Payments_201008",
"Payments_201009", "Payments_201010"), class = "data.frame", row.names = c(NA,
-5L))
基本上我有一个变量显示帐户何时被激活,以及一系列支付变量对应于数据月份。
我要做的是创建一个新的数组Payments1-Payments10
,该数组与激活后的第1个月到第10个月的付款帐户相对应。具体而言 - Payments1
应对应于激活后的第一个月(数据行1 - >值应来自Payments_201002-> 200),Payments2
到激活后2个月的金额等...
我尝试做的是使用以下脚本来移动左侧的元素:
single.shift<- function (x){
r <- rle(is.na(x))
if(!r$values[1]) return(x)
num <- r$length[1]
c(x[-1:-num], rep(NA, num))
}
t(apply(x, 1, single.shift))
由于数据具体情况(激活月份也有付款,历史记录等等),我的具体情况不适用。
如果是SAS,我会做以下事情: 创建2个阵列:
Array Pay1 Payments201001-Payments201010;
Array Pay2 Payments1-Payments10;
我会使用Activation DAte的索引并创建新的var - &gt;例如如果ActivationDate = 201001,则IndexVar = 1,ActivationDate = 201003,则IndexVar = 3,等等。
由于SAS在行上工作,我可以使用循环
do i = 1 to 10-IndexVar; /*(since for the 10th month there's no one month AFTER)*/
Pay2[i] = Pay1[IndexVar+i];
end;
我现在无法在R中完成。
答案 0 :(得分:0)
以下是我如何解决它。我根本不使用ActivationDate
,因为我认为它是由第一个非NA条目定义的。我还使用na.trim
包中的方便zoo
函数删除了尾随的NA。
payment_cumul <- apply(df[, -(1:2)], 1, function(x)
{
y <- na.trim(x, sides = "left")
y[is.na(y)] <- 0
cumsum(y)
})
get_i_months <- function(i)
{
sapply(payment_cumul, function(x) {
z <- x[i]
if (is.na(z)) return(x[length(x)])
z
})
}
# payments for the first month since activation
get_i_months(1)
#Payments_201001 Payments_201002 Payments_201001 Payments_201010 Payments_201008
# 100 100 2342 100 144
# payments for 10 first months
get_i_months(10)
#Payments_201010 Payments_201010 Payments_201010 Payments_201010 Payments_201010
# 800 700 2832 100 403