我在R中遇到了挑战,我真的很感激帮助。我想在我的数据集(100000+行)中添加一列,根据访问时间表示一个人的visitID的顺序。从最近一次访问开始,计数应从1开始,并向上计数。为了使它更复杂一点,当访问成功时,计数应从1开始重新计数。
虚拟数据示例:
#Blockquote
person <- c("a","b","c","d","a","b","c","d","a","b")
visitId <- c(121,131,141,151,161,171,181,191,201,212)
timePM <- c(1,2,3,4,5,6,7,8,10,11)
sucess <- c(0,0,0,0,1,0,1,0,0,0)
data <- data.table(person,visitId,timePM ,sucess)
最终结果应输出以下内容:
#Blockquote
person <- c("a","b","c","d","a","b","c","d","a","b")
visitId <- c(121,131,141,151,161,171,181,191,201,212)
timePM <- c(1,2,3,4,5,6,7,8,10,11)
sucess <- c(0,0,0,0,1,0,1,0,0,0)
indexOrder <- c(2,3,2,2,1,2,1,1,1,1)
data <- data.table(person,visitId,timePM ,sucess,indexOrder)
我尝试嵌套for循环,但我没有设法解决问题。我真的希望有人可以给我一些提示。
非常感谢提前!
答案 0 :(得分:2)
基本上,您只是尝试按sucess == 0
和某些时间顺序运行累积总和person
事件。关于简单cumsum
不起作用的唯一用例(我可以想到)是第一次访问成功时。所以我只是添加了这个条件。所以这似乎有用
data[order(person, -timePM), # Sort by person and time (in decreasing order)
indexOrder2 := cumsum(sucess == 0L | sucess[1L] == 1L), # cumsum with additional condition
by = person] # Make sure we operate per person
data
# person visitId timePM sucess indexOrder indexOrder2
# 1: a 121 1 0 2 2
# 2: b 131 2 0 3 3
# 3: c 141 3 0 2 2
# 4: d 151 4 0 2 2
# 5: a 161 5 1 1 1
# 6: b 171 6 0 2 2
# 7: c 181 7 1 1 1
# 8: d 191 8 0 1 1
# 9: a 201 10 0 1 1
# 10: b 212 11 0 1 1
答案 1 :(得分:0)
如果你想要一个 dplyr 版本的David回答:
library(dplyr)
person <- c("a","b","c","d","a","b","c","d","a","b")
visitId <- c(121,131,141,151,161,171,181,191,201,212)
timePM <- c(1,2,3,4,5,6,7,8,10,11)
sucess <- c(0,0,0,0,1,0,1,0,0,0)
indexOrder <- c(2,3,2,2,1,2,1,1,1,1)
data <- data_frame(person,visitId,timePM ,sucess,indexOrder)
data %>%
group_by(person) %>%
arrange(person, -timePM) %>%
mutate(IndexOrder2 = cumsum(sucess == 0L | sucess[1L] == 1L)) %>%
arrange(timePM)