如何使用条件的时间戳的排名索引创建列?

时间:2016-12-20 15:11:38

标签: r data.table

我在R中遇到了挑战,我真的很感激帮助。我想在我的数据集(100000+行)中添加一列,根据访问时间表示一个人的visitID的顺序。从最近一次访问开始,计数应从1开始,并向上计数。为了使它更复杂一点,当访问成功时,计数应从1开始重新计数。

虚拟数据示例:

#Blockquote
person <- c("a","b","c","d","a","b","c","d","a","b")
visitId <- c(121,131,141,151,161,171,181,191,201,212)
timePM <- c(1,2,3,4,5,6,7,8,10,11)
sucess <- c(0,0,0,0,1,0,1,0,0,0)
data <- data.table(person,visitId,timePM ,sucess)

最终结果应输出以下内容:

#Blockquote
person <- c("a","b","c","d","a","b","c","d","a","b")
visitId <- c(121,131,141,151,161,171,181,191,201,212)
timePM <- c(1,2,3,4,5,6,7,8,10,11)
sucess <- c(0,0,0,0,1,0,1,0,0,0)
indexOrder <- c(2,3,2,2,1,2,1,1,1,1)
data <- data.table(person,visitId,timePM ,sucess,indexOrder)

我尝试嵌套for循环,但我没有设法解决问题。我真的希望有人可以给我一些提示。

非常感谢提前!

2 个答案:

答案 0 :(得分:2)

基本上,您只是尝试按sucess == 0和某些时间顺序运行累积总和person事件。关于简单cumsum不起作用的唯一用例(我可以想到)是第一次访问成功时。所以我只是添加了这个条件。所以这似乎有用

data[order(person, -timePM), # Sort by person and time (in decreasing order)
     indexOrder2 := cumsum(sucess == 0L | sucess[1L] == 1L), # cumsum with additional condition
     by = person] # Make sure we operate per person
data
#     person visitId timePM sucess indexOrder indexOrder2
#  1:      a     121      1      0          2           2
#  2:      b     131      2      0          3           3
#  3:      c     141      3      0          2           2
#  4:      d     151      4      0          2           2
#  5:      a     161      5      1          1           1
#  6:      b     171      6      0          2           2
#  7:      c     181      7      1          1           1
#  8:      d     191      8      0          1           1
#  9:      a     201     10      0          1           1
# 10:      b     212     11      0          1           1

答案 1 :(得分:0)

如果你想要一个 dplyr 版本的David回答:

library(dplyr)

person <- c("a","b","c","d","a","b","c","d","a","b")
visitId <- c(121,131,141,151,161,171,181,191,201,212)
timePM <- c(1,2,3,4,5,6,7,8,10,11)
sucess <- c(0,0,0,0,1,0,1,0,0,0)
indexOrder <- c(2,3,2,2,1,2,1,1,1,1)
data <- data_frame(person,visitId,timePM ,sucess,indexOrder)

data %>%
    group_by(person) %>%
    arrange(person, -timePM) %>%
    mutate(IndexOrder2 = cumsum(sucess == 0L | sucess[1L] == 1L)) %>%
    arrange(timePM)