从其他组列中按组提取以前的值

时间:2017-11-05 19:43:22

标签: r datatable dplyr tidyr

我有一个数据框:

ID_1  <- c("A","B","C","D","A","A","B","E","D","F","H")
ID_2  <- c("G","D","I","A","J","B","K","D","A","H","A")
Value <- c(10,9,15,27,3,28,4,3,11,19,12)
DF <- as.data.frame(cbind(ID_1, ID_2, Value))

我想要一个新列,其中包含基于相应ID的给定ID(&#39; ID_1&#39;)的最后一个(即,前一个)值(&#39;值&#39;)另一列中的ID(&#39; ID_2&#39;)。换句话说:预期的解决方案应该找到给定ID(&#39; ID_1&#39;)的最新/最后ID条目(&#39; ID_2&#39;)并提取相应的值(&#39;值& #39;)在新专栏中。

最终数据集应该如下所示(一个新列添加到现有的三列;插图):

NEW    <- c(NA,NA,NA,9,27,27,28,NA,3,NA,19)
DF_NEW <- as.data.frame(cbind(ID_1, ID_2, Value, NEW))

提前感谢您的帮助!

1 个答案:

答案 0 :(得分:1)

一种选择是在DF上创建行号列,然后使用data.table滚动连接:

library(data.table)
setDT(DF)[, rn := seq_len(.N)]

DF[DF, 
    on=.(ID_2 = ID_1, rn = rn), 
    .(ID_1 = i.ID_1, ID_2 = i.ID_2, Value = i.Value, New = x.Value), 
    roll=Inf
]

#    ID_1 ID_2 Value New
# 1:    A    G    10  NA
# 2:    B    D     9  NA
# 3:    C    I    15  NA
# 4:    D    A    27   9
# 5:    A    J     3  27
# 6:    A    B    28  27
# 7:    B    K     4  28
# 8:    E    D     3  NA
# 9:    D    A    11   3
#10:    F    H    19  NA
#11:    H    A    12  19