按ID转换最后一个非NA值

时间:2019-09-13 07:52:47

标签: r datatable na shift

我有一个数据表,如下所示:

DT<-data.table(day=c(1,2,3,4,5,6,7,8),Consumption=c(5,9,10,2,NA,NA,NA,NA),id=c(1,2,3,1,1,2,2,1))

   day Consumption id
1:   1           5  1
2:   2           9  2
3:   3          10  3
4:   4           2  1
5:   5          NA  1
6:   6          NA  2
7:   7          NA  2
8:   8          NA  1

我想创建两列,以显示观察前的最后一个非Na消耗量值,以及使用id组的那些观察之间的日差。到目前为止,我已经尝试过:

DT[, j := day-shift(day, fill = NA,n=1), by = id]
DT[, yj := shift(Consumption, fill = NA,n=1), by = id]

   day Consumption id  j yj
1:   1           5  1 NA NA
2:   2           9  2 NA NA
3:   3          10  3 NA NA
4:   4           2  1  3  5
5:   5          NA  1  1  2
6:   6          NA  2  4  9
7:   7          NA  2  1 NA
8:   8          NA  1  3 NA 

但是,我希望n = 1的滞后消耗值来自具有非NA消耗值的行。例如,在第七行和“ yj”列中,yj值为NA,因为它来自消耗NA的第六行。我希望它来自第二行。因此,我希望最终得到这个数据表:

   day Consumption id  j yj
1:   1           5  1 NA NA
2:   2           9  2 NA NA
3:   3          10  3 NA NA
4:   4           2  1  3  5
5:   5          NA  1  1  2
6:   6          NA  2  4  9
7:   7          NA  2  5  9
8:   8          NA  1  4  2

注意:之所以专门使用移位函数的参数n,是因为在下一步中,我还将需要倒数第二个非Na消耗值。

谢谢

1 个答案:

答案 0 :(得分:0)

这是解决方案:

library(data.table)
library(zoo)

DT[, `:=`(day_shift = shift(day),
          yj = shift(Consumption)),
   by = id]

#make the NA yj records NA for the days
DT[is.na(yj), day_shift := NA_integer_]

#fill the DT with the last non-NA value
DT[,
   `:=`(day_shift = na.locf(day_shift, na.rm = F),
          yj = zoo::na.locf(yj, na.rm = F)),
   by = id]

# finally calculate j
DT[, j:= day - day_shift]

# you can clean up the ordering or remove columns later
DT

   day Consumption id day_shift yj  j
1:   1           5  1        NA NA NA
2:   2           9  2        NA NA NA
3:   3          10  3        NA NA NA
4:   4           2  1         1  5  3
5:   5          NA  1         4  2  1
6:   6          NA  2         2  9  4
7:   7          NA  2         2  9  5
8:   8          NA  1         4  2  4