我有一个这样的data.table(除了我有更多观察结果):
name id time start rate payment
Anna 100 2000-01-01 100 4 15
Anna 100 2000-02-01 100 4 20
Anna 100 2000-03-01 100 4 25
Jenny 250 2008-01-01 200 5 10
Jenny 250 2008-02-01 200 5 20
Jenny 250 2008-03-01 200 5 30
Jenny 250 2008-04-01 200 5 35
我想创建一个新变量,例如按组(new_var
)命名为name, id
,对于每个变量(start
)中的第一次观察,该变量等于name, id
组,然后等于其先前值乘以(1 + rate
)减去payment
。也就是说,对于name
=安娜和id
= 100,new_var[1]
= 100,new_var[2]
= 100 *(1 + 4)-20 = 480和{{1} } = 480 *(1 + 4)-25 = 2375,其中480是new_var[3]
的值。因此,带有此新变量的整个data.table如下所示:
new_var[2]
是否有可能以某种方式实现这一目标,最好没有循环?
答案 0 :(得分:2)
我不知道如何避免循环,但是您可以在data.table中使用它,但是我认为它仍然会很有效:
### DT re-created with the following code
DT <- data.table(
name = c("Anna","Anna","Anna","Jenny","Jenny","Jenny","Jenny"),
id = c(100L,100L,100L,250L,250L,250L,250L),
time = as.Date(c("2000-01-01","2000-02-01","2000-03-01","2008-01-01","2008-02-01",
"2008-03-01","2008-04-01")),
start = c(100,100,100,200,200,200,200),
rate = c(4,4,4,5,5,5,5),
payment = c(15,20,25,10,20,30,35))
###
computeNewVar <- function(subDT){
v <- subDT$start
if(nrow(subDT)>1){
for(i in 2:nrow(subDT)){
v[i] <- v[i-1] * (1+subDT$rate[i]) - subDT$payment[i]
}
}
v
}
DT[,new_var:=computeNewVar(.SD),by=.(name,id)]
结果:
> DT
name id time start rate payment new_var
1: Anna 100 2000-01-01 100 4 15 100
2: Anna 100 2000-02-01 100 4 20 480
3: Anna 100 2000-03-01 100 4 25 2375
4: Jenny 250 2008-01-01 200 5 10 200
5: Jenny 250 2008-02-01 200 5 20 1180
6: Jenny 250 2008-03-01 200 5 30 7050
7: Jenny 250 2008-04-01 200 5 35 42265
答案 1 :(得分:1)
我对数值方法有些不满意,但是有些变化。
//Trying to just show name property
db.mycollection.find({name});