Question

朋友们，我发现问题是多个日期时间范围相互重叠，如果是，那么它们重叠的时间段就是问题。我已经审阅了以下链接Determine Whether Two Date Ranges Overlap和Algorithm to detect overlapping periods等等。

不知道这是否正确，我有n = 3的样本解释。

说我有'n'个开关sw1，sw2＆amp; sw3.State是ON / OFF状态，即1/0。

Switches,State,Intime,Outtime

sw3,1,9:00:00,10:40:00
sw2,1,9:30:00,10:15:00
sw1,1,10:00:00,11:00:00
sw2,1,10:20:00,10:30:00

我遇到过这种可能性。可能会有更多。还可以寻找其他人。这里的共同时间段是10:00到10:15，即15分钟，10：20-10：30，即10分钟。这些开关接通（'1'）的合并时间为25分钟。

                 10:00                           11:00
              sw1 |-----------------------------------|
       9:30       10:15   10:20     10:30
     sw2 |-------------|      |-------|
 9:00                                     10:40 
sw3 |----------------------------------------|

对n个重叠开关推广这个日期时间是一项艰巨的任务。我仍在努力，所以欢迎任何建议或修改。谢谢。

Answer 1

1）基于样本数据，我们假设数据的形式为hh：mm：00，其中hh＆lt; 24.

读入测试数据。创建两个函数，将hh：mm：00形式的字符串转换为分钟数，以及将分钟数转换为chron "times"对象的函数。为给出Intervals列表的每行数据创建分钟序列。将那些对应于同一开关的序列联合起来给出列表Intervals.u，然后将该列表的组件相交以给出序列Intersection。计算r中的运行Intersection，以提供一组起点和终点。最后计算分钟数并将其转换为"times"级的持续时间。（分钟数和持续时间仅取决于r和Intersection，因此如果不需要intervals.df，我们可以跳过以##结尾的行。）

# test data
Lines <- "Switches,State,Intime,Outtime
sw3,1,9:00:00,10:40:00
sw2,1,9:30:00,10:15:00
sw1,1,10:00:00,11:00:00
sw2,1,10:20:00,10:30:00"
DF <- read.csv(text = Lines, as.is = TRUE)

library(chron)

to.num <- function(x) floor(as.numeric(times(x)) * 24 * 60 + 1e-6)
to.times <- function(x) times(x / (24 * 60))

Seq <- function(r) seq(to.num(DF$Intime[r]), to.num(DF$Outtime[r]))    
Intervals <- lapply(1:nrow(DF), Seq)
Intervals.u <- lapply(split(Intervals, DF$Switches), 
     function(L) Reduce(union, L))
Intersection <- Reduce(intersect, Intervals.u)

r <- rle(c(FALSE, diff(Intersection) == 1))

i.ends <- cumsum(r$lengths)[r$values] ##
ends <- to.times(Intersection[i.ends]) ##
starts <- ends - to.times(r$lengths[r$values]) ##
intervals.df <- data.frame(start = starts, end = ends); intervals.df ##
##         start      end
##    1 10:00:00 10:15:00
##    2 10:20:00 10:30:00

mins <- length(Intersection) - sum(r$values); mins
## [1] 25
duration <- to.times(mins); duration
## [1] 00:25:00

2）关于速度的评论，我们可以使用IRanges软件包，它可以有效地对范围进行编码，并且还可以略微减小代码大小：

library(IRanges)
Intervals <- IRanges(to.num(DF$Intime), to.num(DF$Outtime))
Intersection <- Reduce(intersect, split(Intervals, DF$Switches))

intervals.df <- data.frame(start = to.times(start(Intersection)), 
                           end = to.times(end(Intersection)))
intervals.df
##      start      end
## 1 10:00:00 10:15:00
## 2 10:20:00 10:30:00

mins <- sum(width(Intersection) - 1); mins
## [1] 25
duration <- to.times(mins); duration
## [1] 00:25:00

更新一些修复程序和更好的变量名称。进一步改进。添加了（2）。

Answer 2

这样做的一种方法是：

计算每个开关的Intime和Outtime之间的唯一分钟数/秒。例如。如果一个开关在9:00开启并在9:02关闭，那么它在9:00和9:01的唯一分钟时间。
计算所有交换机中每个唯一分钟/秒出现的次数。
如果有任何分钟/秒发生的次数与开关一样多（即在您的情况下为3次），则所有开关必须打开该分钟/秒。

在这里使用该逻辑是一种潜在的解决方案（您的数据存储在数据框x中）：

# Function to convert string to time.
asTime <- function (tm) as.POSIXlt(tm, format = '%H:%M:%S')

# Calculate unique minutes between Intimes and Outtimes.
minSpan <- function (start, end) seq(asTime(start), asTime(end) - 1, 'min')

# Calculate the time span in minutes for each row.
spans <- mapply(minSpan, x$Intime, x$Outtime)

# Count how many times each minute appears.
counts <- table(do.call(c, spans))

# Total number of switches.
switches <- length(unique(x$Switches))

# Count minutes where all switches have been on.
length(counts[counts == switches])

这将为您提供一分钟的精确度，因为这似乎是您在问题中显示的内容。虽然您可以轻松地将其更改为秒，但需要在'min'函数中将'sec'更改为minSpan()。

在minSpan()中，我从Outtime减去一分钟：

minSpan <- function (start, end) seq(asTime(start), asTime(end) - 1, 'min')

那是因为如果你要计算例如10:00和10:02，seq()将返回3分钟，10：00,10：01,10：02。但实际上开关在10:02关闭了，所以你真的希望从10:00到10:01。

无论如何，这个解决方案似乎适用于您给出的小例子。根据您的数据量有多大，我希望这个数据足够慢，但这可能不是问题。

日期时间 - 确定R中的多个（n）日期时间范围是否相互重叠

2 个答案: