数据框包含三个变量:ID,状态(因子)和t(整数)。每个ID可能有多个观察结果。见下文
id <- c(1,1,1,1,1,2,2,2,2,3,4,4,4,5,5)
state <- c("a", "b", "c", "d", "e", "a", "a", "c", "e", "b", "e", "a", "c", "b", "a")
t <- c(1,2,3,4,5, 1,2,3,4, 1, 1,2,3, 1,2)
dat <- data.frame(id, state, t)
我想创建一个包含两个变量的新数据框:&#34;来自&#34;和&#34;到&#34;。对于每个ID,我想看看状态如何变化: 例如,对于ID,状态从a移动到b,然后b移到c,然后c移到d,最后移到d到e。如果有任何ID的记录,那么 忽略(例如,ID = 3)。
# The final data set looks like:
from <- c("a", "b", "c", "d", "a", "a", "c", "e", "a", "b")
to <- c("b", "c", "d", "e", "a", "c", "e", "a", "c", "a")
dat2 <- data.frame(from, to)
答案 0 :(得分:1)
我们可以尝试使用data.table
。转换&#39; data.frame&#39;到&#39; data.table&#39; (setDT(dat)
),按&#39; id&#39;分组,删除状态&#39;中的最后一个观察值。从&#39;创建&#39;并首先观察创建&#39;到&#39;
library(data.table)
setDT(dat)[, .(from = state[-.N], to = state[-1]), id][, id := NULL][]
# from to
# 1: a b
# 2: b c
# 3: c d
# 4: d e
# 5: a a
# 6: a c
# 7: c e
# 8: e a
# 9: a c
#10: b a