估算多个日期之间的差距

时间:2016-09-08 02:45:17

标签: r dplyr plyr lubridate

有没有办法找到多个时间轴之间的差距。例如,我的数据如下所示:

library(plyr);library(dplyr)
library(googleVis)

df <- data.frame(Language = structure(c(rep("English",7), rep("German",5), rep("French", 10)), class = "character"),
                 Students = c(LETTERS[1:7], LETTERS[1:5], LETTERS[1:10]), 
                 Start = structure(c(16713,16713,16713,16744,16713,16714,16754,16729,16729,16729,16750,16769,
                                     16724,16724,16745,16724,16759,16766,16723,16722,16736,16796), class = "Date"), 
                 End = structure(c(16762,16720,16762,16755,16720,16764,16762,16765,16765,16749,16761,16770,16758,
                                   16744,16758,16764,16765,16766,16726,16723,16758,16806), class = "Date"))

ddply(df, .(Language), summarise,
      FirstDay = min(Start),
      LastDay = max(End), 
      Duration = LastDay - FirstDay)

plot(gvisTimeline(data=df, rowlabel = "Class", start = "Start", end = "End", options=list(width=600, height=1000) ))

在没有学生上课时,我正在计算差距。下图中的间隙以红色突出显示。

enter image description here

1 个答案:

答案 0 :(得分:5)

这是一个相当经典的问题。解决此问题的方法是根据开始日期是否大于先前的最大结束日期来过滤行,假设行按开始日期排序。 lag函数和cummax()可用于查找以前的最大结束日期,并且由于cummax()未定义为Date类,我们可以将其转换为整数,应用cummax然后将其转换回来:

library(dplyr)
df %>% 
       arrange(Start) %>% group_by(Language) %>% 
       mutate(End_Max = lag(as.Date(cummax(as.integer(End)), "1970-01-01"))) %>% 
       filter(Start > End_Max + 1) %>% select(Language, End_Max, Start)

# Source: local data frame [2 x 3]
# Groups: Language [2]

#  Language    End_Max      Start
#    <fctr>     <date>     <date>
#1   German 2015-11-26 2015-11-30
#2   French 2015-11-27 2015-12-27