基于分组和相邻变量的NA归因

时间:2019-05-08 10:18:33

标签: r data.table na imputation

数据集

df <- data.frame(ID = c(55, 55, 55, 55, 55, 55, 55, 55, 55, 55,
                        66, 66, 66, 66, 66, 66, 66, 66, 66, 66),
                 counter = c(0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
                             0, 1, 1, 1, 1, 1, 1, 1, 1, 1))

下面的代码创建了两个三个四个功能,这些功能计算了最后两个,三个或四个观测值,而没有当前行计数器功能。该计算按 ID 分组。

setDT(df)[,  two := Reduce(`+`, shift(counter, 1:2)), by = ID]
setDT(df)[,  three := Reduce(`+`, shift(counter, 1:3)), by = ID]
setDT(df)[,  four := Reduce(`+`, shift(counter, 1:4)), by = ID]

现在的样子:

    ID counter two three four
 1: 55       0  NA    NA   NA
 2: 55       1  NA    NA   NA
 3: 55       1   1    NA   NA
 4: 55       1   2     2   NA
 5: 55       1   2     3    3
 6: 55       1   2     3    4
 7: 55       1   2     3    4
 8: 55       1   2     3    4
 9: 55       1   2     3    4
10: 55       1   2     3    4
11: 66       0  NA    NA   NA
12: 66       1  NA    NA   NA
13: 66       1   1    NA   NA
14: 66       1   2     2   NA
15: 66       1   2     3    3
16: 66       1   2     3    4
17: 66       1   2     3    4
18: 66       1   2     3    4
19: 66       1   2     3    4
20: 66       1   2     3    4

目标:

   ID counter two three four
1  55       0   0     0    0
2  55       1   0     0    0
3  55       1   1     1    1
4  55       1   1     2    2
5  55       1   2     3    3
6  55       1   2     3    4
7  55       1   2     3    4
8  55       1   2     3    4
9  55       1   2     3    4
10 55       1   2     3    4
11 66       0   0     0    0
12 66       1   0     0    0
13 66       1   1     1    1
14 66       1   1     2    2
15 66       1   2     3    3
16 66       1   2     3    4
17 66       1   2     3    4
18 66       1   2     3    4
19 66       1   2     3    4
20 66       1   2     3    4

1 个答案:

答案 0 :(得分:2)

我们可以指定fill参数

setDT(df)[,  two := Reduce(`+`, shift(counter, 1:2, fill = 0)), by = ID]