如何根据R中的另一个类别对连续日进行分组

时间:2018-04-04 08:53:02

标签: r group-by

我想使用以下数据框

time <- c("01/01/1951", "02/01/1951", "03/01/1951", "04/01/1951", "03/03/1953", "04/03/1953", "05/03/1953", "06/03/1953", "02/01/1951", "03/01/1951", "04/01/1951", "05/01/1951", "13/03/1953", "14/03/1953", "15/03/1953", "16/03/1953", "01/05/1951", "02/05/1951", "03/05/1951", "04/05/1951", "04/03/1953", "05/03/1953", "06/03/1953", "07/03/1953")
member <- c(1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3)
trainall <- data.frame(time, member)
trainall$time = as.Date(trainall$time,format="%d/%m/%Y")

根据成员连续几天订购。因此,如果同一天在成员2和1中,我不希望它们组合在一起作为连续! 最终我想要一个新专栏来制作这个小组

这是我尝试但它没有用的

y = sort(trainall$time)
trainall$g = cumsum(c(1, abs(y[-length(y)] - y[-1]) > 1))

这是我想要的结果。

 trainall
     time       member g
1  01/01/1951      1 1
2  02/01/1951      1 1
3  03/01/1951      1 1
4  04/01/1951      1 1
5  03/03/1953      1 2
6  04/03/1953      1 2
7  05/03/1953      1 2
8  06/03/1953      1 2
9  02/01/1951      2 3
10 03/01/1951      2 3
11 04/01/1951      2 3
12 05/01/1951      2 3
13 13/03/1953      2 4
14 14/03/1953      2 4
15 15/03/1953      2 4
16 16/03/1953      2 4
17 01/05/1951      3 5
18 02/05/1951      3 5
19 03/05/1951      3 5
20 04/05/1951      3 5
21 04/03/1953      3 6
22 05/03/1953      3 6
23 06/03/1953      3 6
24 07/03/1953      3 6

最终这是我想要的结果。但是,我在这里手动完成,我的实际数据框要大得多(16个成员)

谁知道如何轻松做到这一点?

2 个答案:

答案 0 :(得分:1)

使用逻辑值作为整数0和1以及您的朋友diff可以解决问题。只要您的数据按成员和时间排序,这样的事情就应该这样做。

# Your data
time <- c("01/01/1951", "02/01/1951", "03/01/1951", "04/01/1951", "03/03/1953", "04/03/1953", "05/03/1953", "06/03/1953", "02/01/1951", "03/01/1951", "04/01/1951", "05/01/1951", "13/03/1953", "14/03/1953", "15/03/1953", "16/03/1953", "01/05/1951", "02/05/1951", "03/05/1951", "04/05/1951", "04/03/1953", "05/03/1953", "06/03/1953", "07/03/1953")
member <- c(1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3)
trainall <- data.frame(time, member)
trainall$time = as.Date(trainall$time,format="%d/%m/%Y")

# Creating column g
trainall$g <- cumsum(c(1, (abs(diff(trainall$time)) + diff(trainall$member))!=1))
print(trainall)
#         time member g
#1  1951-01-01      1 1
#2  1951-01-02      1 1
#3  1951-01-03      1 1
#4  1951-01-04      1 1
#5  1953-03-03      1 2
#6  1953-03-04      1 2
#7  1953-03-05      1 2
#8  1953-03-06      1 2
#9  1951-01-02      2 3
#10 1951-01-03      2 3
#11 1951-01-04      2 3
#12 1951-01-05      2 3
#13 1953-03-13      2 4
#14 1953-03-14      2 4
#15 1953-03-15      2 4
#16 1953-03-16      2 4
#17 1951-05-01      3 5
#18 1951-05-02      3 5
#19 1951-05-03      3 5
#20 1951-05-04      3 5
#21 1953-03-04      3 6
#22 1953-03-05      3 6
#23 1953-03-06      3 6
#24 1953-03-07      3 6

编辑:在时差附近添加了abs()。我想abs不能严格省略,因为当成员改变时,你可能有-2天的时差,这导致总和为1.

编辑2:Re。你的额外评论,试试<​​/ p>

trainall$G <- sequence(table(trainall$g))

答案 1 :(得分:0)

以下是来自.GRP

data.table的一个选项
library(data.table)
setDT(trainall)[, g := .GRP, .(member, grp = cumsum(c(FALSE, diff(time) != 1)))]
trainall
#          time member g
# 1: 1951-01-01      1 1
# 2: 1951-01-02      1 1
# 3: 1951-01-03      1 1
# 4: 1951-01-04      1 1
# 5: 1953-03-03      1 2
# 6: 1953-03-04      1 2
# 7: 1953-03-05      1 2
# 8: 1953-03-06      1 2
# 9: 1951-01-02      2 3
#10: 1951-01-03      2 3
#11: 1951-01-04      2 3
#12: 1951-01-05      2 3
#13: 1953-03-13      2 4
#14: 1953-03-14      2 4
#15: 1953-03-15      2 4
#16: 1953-03-16      2 4
#17: 1951-05-01      3 5
#18: 1951-05-02      3 5
#19: 1951-05-03      3 5
#20: 1951-05-04      3 5
#21: 1953-03-04      3 6
#22: 1953-03-05      3 6
#23: 1953-03-06      3 6
#24: 1953-03-07      3 6