我有一个数据表,如下所示:
DT<-data.table(day=c(1,2,3,4,5,6,7,8),Consumption=c(5,9,10,2,NA,NA,NA,NA),id=c(1,2,3,1,1,2,2,1))
day Consumption id
1: 1 5 1
2: 2 9 2
3: 3 10 3
4: 4 2 1
5: 5 NA 1
6: 6 NA 2
7: 7 NA 2
8: 8 NA 1
我想创建两列,以显示观察前的最后一个非Na消耗量值,以及使用id组的那些观察之间的日差。到目前为止,我已经尝试过:
DT[, j := day-shift(day, fill = NA,n=1), by = id]
DT[, yj := shift(Consumption, fill = NA,n=1), by = id]
day Consumption id j yj
1: 1 5 1 NA NA
2: 2 9 2 NA NA
3: 3 10 3 NA NA
4: 4 2 1 3 5
5: 5 NA 1 1 2
6: 6 NA 2 4 9
7: 7 NA 2 1 NA
8: 8 NA 1 3 NA
但是,我希望n = 1的滞后消耗值来自具有非NA消耗值的行。例如,在第七行和“ yj”列中,yj值为NA,因为它来自消耗NA的第六行。我希望它来自第二行。因此,我希望最终得到这个数据表:
day Consumption id j yj
1: 1 5 1 NA NA
2: 2 9 2 NA NA
3: 3 10 3 NA NA
4: 4 2 1 3 5
5: 5 NA 1 1 2
6: 6 NA 2 4 9
7: 7 NA 2 5 9
8: 8 NA 1 4 2
注意:之所以专门使用移位函数的参数n,是因为在下一步中,我还将需要倒数第二个非Na消耗值。
谢谢
答案 0 :(得分:0)
这是data.table的zoo解决方案:
library(data.table)
library(zoo)
DT[, `:=`(day_shift = shift(day),
yj = shift(Consumption)),
by = id]
#make the NA yj records NA for the days
DT[is.na(yj), day_shift := NA_integer_]
#fill the DT with the last non-NA value
DT[,
`:=`(day_shift = na.locf(day_shift, na.rm = F),
yj = zoo::na.locf(yj, na.rm = F)),
by = id]
# finally calculate j
DT[, j:= day - day_shift]
# you can clean up the ordering or remove columns later
DT
day Consumption id day_shift yj j
1: 1 5 1 NA NA NA
2: 2 9 2 NA NA NA
3: 3 10 3 NA NA NA
4: 4 2 1 1 5 3
5: 5 NA 1 4 2 1
6: 6 NA 2 2 9 4
7: 7 NA 2 2 9 5
8: 8 NA 1 4 2 4