根据另一个变量值从变量数组中获取值

时间:2014-10-22 16:24:00

标签: arrays r

假设我有以下数据集:

structure(list(AccountNumber = 1:5, ActivationDate = c(201001L, 
201002L, 201001L, 201010L, 201008L), Payments_201001 = c(100L, 
NA, 2342L, NA, NA), Payments_201002 = c(200L, 100L, 235L, NA, 
NA), Payments_201003 = c(100L, 100L, 111L, NA, NA), Payments_201004 = c(100L, 
100L, 144L, NA, NA), Payments_201005 = c(150L, 100L, NA, NA, 
NA), Payments_201006 = c(150L, 100L, NA, NA, NA), Payments_201007 = c(NA, 
100L, NA, NA, NA), Payments_201008 = c(NA, 100L, NA, NA, 144L
), Payments_201009 = c(NA, NA, NA, NA, 159L), Payments_201010 = c(NA, 
NA, NA, 100L, 100L)), .Names = c("AccountNumber", "ActivationDate", 
"Payments_201001", "Payments_201002", "Payments_201003", "Payments_201004", 
"Payments_201005", "Payments_201006", "Payments_201007", "Payments_201008", 
"Payments_201009", "Payments_201010"), class = "data.frame", row.names = c(NA, 
-5L))

基本上我有一个变量显示帐户何时被激活,以及一系列支付变量对应于数据月份。

我要做的是创建一个新的数组Payments1-Payments10,该数组与激活后的第1个月到第10个月的付款帐户相对应。具体而言 - Payments1应对应于激活后的第一个月(数据行1 - >值应来自Payments_201002-> 200),Payments2到激活后2个月的金额等...

我尝试做的是使用以下脚本来移动左侧的元素:

single.shift<- function (x){
      r <- rle(is.na(x))
      if(!r$values[1]) return(x)
      num <- r$length[1]
      c(x[-1:-num], rep(NA, num))
}
 t(apply(x, 1, single.shift))

由于数据具体情况(激活月份也有付款,历史记录等等),我的具体情况不适用。

如果是SAS,我会做以下事情: 创建2个阵列:

Array Pay1 Payments201001-Payments201010;
Array Pay2 Payments1-Payments10;

我会使用Activation DAte的索引并创建新的var - &gt;例如如果ActivationDate = 201001,则IndexVar = 1,ActivationDate = 201003,则IndexVar = 3,等等。

由于SAS在行上工作,我可以使用循环

do i = 1 to 10-IndexVar; /*(since for the 10th month there's no one month AFTER)*/
Pay2[i] = Pay1[IndexVar+i];
end;

我现在无法在R中完成。

1 个答案:

答案 0 :(得分:0)

以下是我如何解决它。我根本不使用ActivationDate,因为我认为它是由第一个非NA条目定义的。我还使用na.trim包中的方便zoo函数删除了尾随的NA。

payment_cumul <- apply(df[, -(1:2)], 1, function(x) 
{
  y <- na.trim(x, sides = "left")
  y[is.na(y)] <- 0
  cumsum(y)  
})

get_i_months <- function(i) 
{
  sapply(payment_cumul, function(x) {
    z <- x[i]
    if (is.na(z)) return(x[length(x)])
    z
  })
}
# payments for the first month since activation
get_i_months(1)
#Payments_201001 Payments_201002 Payments_201001 Payments_201010 Payments_201008 
#            100             100            2342             100             144 

# payments for 10 first months
get_i_months(10)
#Payments_201010 Payments_201010 Payments_201010 Payments_201010 Payments_201010 
#            800             700            2832             100             403