R在data.table中按组创建递归变量

时间:2018-11-09 08:23:25

标签: r group-by data.table

我有一个这样的data.table(除了我有更多观察结果):

name  id       time start rate payment
Anna 100 2000-01-01   100    4      15
Anna 100 2000-02-01   100    4      20
Anna 100 2000-03-01   100    4      25
Jenny 250 2008-01-01   200    5      10
Jenny 250 2008-02-01   200    5      20
Jenny 250 2008-03-01   200    5      30
Jenny 250 2008-04-01   200    5      35

我想创建一个新变量,例如按组(new_var)命名为name, id,对于每个变量(start)中的第一次观察,该变量等于name, id组,然后等于其先前值乘以(1 + rate)减去payment。也就是说,对于name =安娜和id = 100,new_var[1] = 100,new_var[2] = 100 *(1 + 4)-20 = 480和{{1} } = 480 *(1 + 4)-25 = 2375,其中480是new_var[3]的值。因此,带有此新变量的整个data.table如下所示:

new_var[2]

是否有可能以某种方式实现这一目标,最好没有循环?

2 个答案:

答案 0 :(得分:2)

我不知道如何避免循环,但是您可以在data.table中使用它,但是我认为它仍然会很有效:

### DT re-created with the following code
DT <- data.table(
        name = c("Anna","Anna","Anna","Jenny","Jenny","Jenny","Jenny"),
        id = c(100L,100L,100L,250L,250L,250L,250L), 
        time = as.Date(c("2000-01-01","2000-02-01","2000-03-01","2008-01-01","2008-02-01",
                         "2008-03-01","2008-04-01")),
        start = c(100,100,100,200,200,200,200), 
        rate = c(4,4,4,5,5,5,5),
        payment = c(15,20,25,10,20,30,35))
###

computeNewVar <- function(subDT){
  v <- subDT$start
  if(nrow(subDT)>1){
    for(i in 2:nrow(subDT)){
      v[i] <- v[i-1] * (1+subDT$rate[i]) - subDT$payment[i]
    }
  }
  v
}

DT[,new_var:=computeNewVar(.SD),by=.(name,id)]

结果:

> DT
    name  id       time start rate payment new_var
1:  Anna 100 2000-01-01   100    4      15     100
2:  Anna 100 2000-02-01   100    4      20     480
3:  Anna 100 2000-03-01   100    4      25    2375
4: Jenny 250 2008-01-01   200    5      10     200
5: Jenny 250 2008-02-01   200    5      20    1180
6: Jenny 250 2008-03-01   200    5      30    7050
7: Jenny 250 2008-04-01   200    5      35   42265

答案 1 :(得分:1)

我对数值方法有些不满意,但是有些变化。

//Trying to just show name property
db.mycollection.find({name});