我正在努力解决这个问题,我不知道是否有一个更优雅的解决方案,除了用内置的一堆条件逻辑循环列。我的数据看起来像这样:
data.frame(user=c(rep("01",14),rep("02",6),rep("03",9)),time= c(1:14,1:6,1:9),
event = c(rep("a",3),"d",rep("a",1),rep("b",2),rep("a",2),rep("d",2),rep("a",3),
rep("b",2),rep("a",2),rep("c",2),rep("d",2),rep("b",5),rep("c",1),rep("b",1)))
对于每个user
,我希望event
每次更改time
以及前一事件发生的次数。所以这看起来像这样:
data.frame(user=c(rep("01",6),rep("02",2),rep("03",3)),
source=c("a","d","a","b","a","d", "b","a", "d","b","c"),
target=c("d","a","b","a","d","a", "a","c", "b","c","b"),
source_cnt=c(3,1,1,2,2,2 ,2,2, 2,5,1))
有什么建议吗?
答案 0 :(得分:1)
do.call(rbind, lapply(split(df, df$user), function(x){
ev = rle(as.character(x$event))
data.frame(user = x$user[1],
source = head(ev$values, -1),
target = ev$values[-1],
source_cnt = head(ev$lengths, -1))
}))
# user source target source_cnt
#01.1 01 a d 3
#01.2 01 d a 1
#01.3 01 a b 1
#01.4 01 b a 2
#01.5 01 a d 2
#01.6 01 d a 2
#02.1 02 b a 2
#02.2 02 a c 2
#03.1 03 d b 2
#03.2 03 b c 5
#03.3 03 c b 1