假设我有以下数据集
data
Group Date
A 2016-03-10
A 2016-03-11
A 2016-03-12
A 2016-04-13
A 2016-04-14
A 2016-05-15
A 2016-05-16
A 2016-05-17
B 2016-02-11
B 2016-02-12
B 2016-02-13
B 2016-02-19
B 2016-03-15
我想找到每个组的不同日期间隔。例如,对于A组,2016-03-10至2016-03-12应为区间1,2016-04-13至2016-04-14应为区间2和2016-05-15至2016-05-17应该是间隔3.我想找到所有中断的地方以及每组发生了多少休息时间。通过这种方式我可以分析。应该为每个组计算。以下应该是我理想的输出,
Group Date Interval
A 2016-03-10 1
A 2016-03-11 1
A 2016-03-12 1
A 2016-04-13 2
A 2016-04-14 2
A 2016-05-15 3
A 2016-05-16 3
A 2016-05-17 3
B 2016-02-11 1
B 2016-02-12 1
B 2016-02-13 1
B 2016-02-19 2
B 2016-03-15 3
以下是我的尝试,
data %>% group_by(Group) %>% mutate(Date - lag(Date)) .
这给出了第一行的NA输出,每当日期改变时输出为1,而当它没有改变时输出为0。但是我想要每个日期间隔1,2,3这样的东西。
更新后无法使用的数据集
group date count
(factor) (date)
1 Albany 2016-02-15 55
2 Albany 2016-02-16 1
3 Albany 2016-04-08 40
答案 0 :(得分:6)
你可以在差异向量上cumsum
,当差异不是1
时,值被指定为TRUE
:
df %>%
group_by(Group) %>%
mutate(Interval = cumsum(Date - lag(Date, default = first(Date)) != 1))
# Source: local data frame [13 x 3]
# Groups: Group [2]
# Group Date Interval
# <fctr> <date> <int>
#1 A 2016-03-10 1
#2 A 2016-03-11 1
#3 A 2016-03-12 1
#4 A 2016-04-13 2
#5 A 2016-04-14 2
#6 A 2016-05-15 3
#7 A 2016-05-16 3
#8 A 2016-05-17 3
#9 B 2016-02-11 1
#10 B 2016-02-12 1
#11 B 2016-02-13 1
#12 B 2016-02-19 2
#13 B 2016-03-15 3
数据:
df = structure(list(Group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 2L, 2L, 2L, 2L, 2L), .Label = c("A", "B"), class = "factor"),
Date = structure(c(16870, 16871, 16872, 16904, 16905, 16936,
16937, 16938, 16842, 16843, 16844, 16850, 16875), class = "Date")), .Names = c("Group",
"Date"), row.names = c(NA, -13L), class = "data.frame")
答案 1 :(得分:0)
这是此问题的某种副本:Group rows in data frame based on time difference between consecutive rows
基本上你想做这两项操作:
p::value_type