获取不同订单的计数和值

时间:2017-10-06 20:24:24

标签: r

我正在努力解决这个问题,我不知道是否有一个更优雅的解决方案,除了用内置的一堆条件逻辑循环列。我的数据看起来像这样:

data.frame(user=c(rep("01",14),rep("02",6),rep("03",9)),time= c(1:14,1:6,1:9), 
           event = c(rep("a",3),"d",rep("a",1),rep("b",2),rep("a",2),rep("d",2),rep("a",3),
                     rep("b",2),rep("a",2),rep("c",2),rep("d",2),rep("b",5),rep("c",1),rep("b",1)))

对于每个user,我希望event每次更改time以及前一事件发生的次数。所以这看起来像这样:

data.frame(user=c(rep("01",6),rep("02",2),rep("03",3)), 
           source=c("a","d","a","b","a","d", "b","a", "d","b","c"), 
           target=c("d","a","b","a","d","a", "a","c", "b","c","b"),
           source_cnt=c(3,1,1,2,2,2 ,2,2, 2,5,1))

有什么建议吗?

1 个答案:

答案 0 :(得分:1)

do.call(rbind, lapply(split(df, df$user), function(x){
    ev = rle(as.character(x$event))
    data.frame(user = x$user[1],
               source = head(ev$values, -1),
               target = ev$values[-1],
               source_cnt = head(ev$lengths, -1))
}))
#     user source target source_cnt
#01.1   01      a      d          3
#01.2   01      d      a          1
#01.3   01      a      b          1
#01.4   01      b      a          2
#01.5   01      a      d          2
#01.6   01      d      a          2
#02.1   02      b      a          2
#02.2   02      a      c          2
#03.1   03      d      b          2
#03.2   03      b      c          5
#03.3   03      c      b          1