我想使用以下数据框
time <- c("01/01/1951", "02/01/1951", "03/01/1951", "04/01/1951", "03/03/1953", "04/03/1953", "05/03/1953", "06/03/1953", "02/01/1951", "03/01/1951", "04/01/1951", "05/01/1951", "13/03/1953", "14/03/1953", "15/03/1953", "16/03/1953", "01/05/1951", "02/05/1951", "03/05/1951", "04/05/1951", "04/03/1953", "05/03/1953", "06/03/1953", "07/03/1953")
member <- c(1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3)
trainall <- data.frame(time, member)
trainall$time = as.Date(trainall$time,format="%d/%m/%Y")
根据成员连续几天订购。因此,如果同一天在成员2和1中,我不希望它们组合在一起作为连续! 最终我想要一个新专栏来制作这个小组
这是我尝试但它没有用的
y = sort(trainall$time)
trainall$g = cumsum(c(1, abs(y[-length(y)] - y[-1]) > 1))
这是我想要的结果。
trainall
time member g
1 01/01/1951 1 1
2 02/01/1951 1 1
3 03/01/1951 1 1
4 04/01/1951 1 1
5 03/03/1953 1 2
6 04/03/1953 1 2
7 05/03/1953 1 2
8 06/03/1953 1 2
9 02/01/1951 2 3
10 03/01/1951 2 3
11 04/01/1951 2 3
12 05/01/1951 2 3
13 13/03/1953 2 4
14 14/03/1953 2 4
15 15/03/1953 2 4
16 16/03/1953 2 4
17 01/05/1951 3 5
18 02/05/1951 3 5
19 03/05/1951 3 5
20 04/05/1951 3 5
21 04/03/1953 3 6
22 05/03/1953 3 6
23 06/03/1953 3 6
24 07/03/1953 3 6
最终这是我想要的结果。但是,我在这里手动完成,我的实际数据框要大得多(16个成员)
谁知道如何轻松做到这一点?答案 0 :(得分:1)
使用逻辑值作为整数0和1以及您的朋友diff
可以解决问题。只要您的数据按成员和时间排序,这样的事情就应该这样做。
# Your data
time <- c("01/01/1951", "02/01/1951", "03/01/1951", "04/01/1951", "03/03/1953", "04/03/1953", "05/03/1953", "06/03/1953", "02/01/1951", "03/01/1951", "04/01/1951", "05/01/1951", "13/03/1953", "14/03/1953", "15/03/1953", "16/03/1953", "01/05/1951", "02/05/1951", "03/05/1951", "04/05/1951", "04/03/1953", "05/03/1953", "06/03/1953", "07/03/1953")
member <- c(1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3)
trainall <- data.frame(time, member)
trainall$time = as.Date(trainall$time,format="%d/%m/%Y")
# Creating column g
trainall$g <- cumsum(c(1, (abs(diff(trainall$time)) + diff(trainall$member))!=1))
print(trainall)
# time member g
#1 1951-01-01 1 1
#2 1951-01-02 1 1
#3 1951-01-03 1 1
#4 1951-01-04 1 1
#5 1953-03-03 1 2
#6 1953-03-04 1 2
#7 1953-03-05 1 2
#8 1953-03-06 1 2
#9 1951-01-02 2 3
#10 1951-01-03 2 3
#11 1951-01-04 2 3
#12 1951-01-05 2 3
#13 1953-03-13 2 4
#14 1953-03-14 2 4
#15 1953-03-15 2 4
#16 1953-03-16 2 4
#17 1951-05-01 3 5
#18 1951-05-02 3 5
#19 1951-05-03 3 5
#20 1951-05-04 3 5
#21 1953-03-04 3 6
#22 1953-03-05 3 6
#23 1953-03-06 3 6
#24 1953-03-07 3 6
编辑:在时差附近添加了abs()
。我想abs
不能严格省略,因为当成员改变时,你可能有-2天的时差,这导致总和为1.
编辑2:Re。你的额外评论,试试</ p>
trainall$G <- sequence(table(trainall$g))
答案 1 :(得分:0)
以下是来自.GRP
data.table
的一个选项
library(data.table)
setDT(trainall)[, g := .GRP, .(member, grp = cumsum(c(FALSE, diff(time) != 1)))]
trainall
# time member g
# 1: 1951-01-01 1 1
# 2: 1951-01-02 1 1
# 3: 1951-01-03 1 1
# 4: 1951-01-04 1 1
# 5: 1953-03-03 1 2
# 6: 1953-03-04 1 2
# 7: 1953-03-05 1 2
# 8: 1953-03-06 1 2
# 9: 1951-01-02 2 3
#10: 1951-01-03 2 3
#11: 1951-01-04 2 3
#12: 1951-01-05 2 3
#13: 1953-03-13 2 4
#14: 1953-03-14 2 4
#15: 1953-03-15 2 4
#16: 1953-03-16 2 4
#17: 1951-05-01 3 5
#18: 1951-05-02 3 5
#19: 1951-05-03 3 5
#20: 1951-05-04 3 5
#21: 1953-03-04 3 6
#22: 1953-03-05 3 6
#23: 1953-03-06 3 6
#24: 1953-03-07 3 6