Python中的时间范围重叠算法

时间:2016-11-08 11:58:29

标签: python algorithm datetime

我有一个不同的ID列表,开始日期和结束日期, 让我们说:

[
(5, d.datetime(2010, 9, 19, 0, 0, 0),    d.datetime(2010, 9, 19, 0, 5, 10)),
(6, d.datetime(2010, 9, 19, 0, 0, 0),    d.datetime(2010, 9, 19, 12, 59, 59)),
(4, d.datetime(2010, 9, 19, 10, 30, 17), d.datetime(2010, 9, 19, 20, 20, 59)),
(6, d.datetime(2010, 9, 19, 14, 12, 0),  d.datetime(2010, 9, 19, 23, 59, 59)),
(5, d.datetime(2010, 9, 19, 17, 0, 22),  d.datetime(2010, 9, 19, 19, 14, 20))
]

我需要以某种方式找到重叠的时间范围并使用在特定时间范围内覆盖的适当ID来准备新列表,例如对于上面的列表结果应该是:

[
('5,6',   d.datetime(2010, 9, 19, 0, 0, 0),    d.datetime(2010, 9, 19, 0, 5, 10),
('6',     d.datetime(2010, 9, 19, 0, 5, 10),   d.datetime(2010, 9, 19, 10, 30, 17),
('4,6',   d.datetime(2010, 9, 19, 10, 30, 17), d.datetime(2010, 9, 19, 12, 59, 59),
('4',     d.datetime(2010, 9, 19, 12, 59, 59), d.datetime(2010, 9, 19, 14, 12, 0),
('4,6',   d.datetime(2010, 9, 19, 14, 12, 0),  d.datetime(2010, 9, 19, 17, 0, 22),
('4,5,6', d.datetime(2010, 9, 19, 17, 0, 22),  d.datetime(2010, 9, 19, 19, 14, 20),
('4,6',   d.datetime(2010, 9, 19, 19, 14, 20), d.datetime(2010, 9, 19, 20, 20, 59),
('6',     d.datetime(2010, 9, 19, 20, 20, 59), d.datetime(2010, 9, 19, 23, 59, 59)
]

视觉概念:

enter image description here

实际上现在我有这样的解决方案:我得到整个范围的最小和最大日期,然后每1秒开始从min_date迭代到max_date,特别是第二个我们匹配一些间隔从目标列表中,我将匹配的id保存为字典键,并将时间从迭代器附加到列表作为值,然后将其保存到父列表,然后下一个和下一个。在最后,我将查看父列表中的所有dicts,并将id作为键,并将值列表中的第一个,最后一个日期作为我需要查找的范围。 但是,当我计算月份范围时,此解决方案的工作速度非常慢因为它需要花费太多时间在一个月内迭代1个月。

这是代码:

    def delta(start, end, delta):
        cur = start
        while cur < end:
            yield cur
            cur += delta

    final_ranges = []
    last_result = None
    i = -1
    for checker_date in delta(
            sorted_ranges_by_start[0]['start'],
            sorted_ranges_by_end[-1]['end'],
            relativedelta(seconds=1)):

        aggregator = []
        for rng in ranges:
            if rng['start'] <= checker_date <= rng['end']:
                aggregator.append(str(rng['id']))

        if len(aggregator) > 0:
            ids = ','.join(set(aggregator))
            if last_result != ids:
                final_ranges.append({})
                last_result = ids
                i += 1

            if ids not in final_ranges[i]:
                final_ranges[i][ids] = []

            final_ranges[i][ids].append(checker_date)

但正如我所说,它在大范围内的工作非常缓慢。

通过这种方式,请帮助我找到可以在没有迭代的情况下执行它的算法,或者可以通过任何方式提高迭代速度(不确定,也许尝试在C上编写此部分然后嵌入到Python中)

感谢。

3 个答案:

答案 0 :(得分:1)

我已经使用下面的代码了。

基本解释是首先检测所提供时间段之间的切割点,即每个时间段开始时的切割点。其次,仅在切割点之间迭代,而不是在句点之间迭代,并检查它们是否与任何重叠,以查看它们是否在这些切割点之间处于活动状态。累积活跃期。

处理时间取决于分界点和期间的数量,而不是经过的时间。

from datetime import datetime
from sortedcontainers import SortedSet

periods = [
    (5, datetime(2010, 9, 19, 0, 0, 0),    datetime(2010, 9, 19, 0, 5, 10)),
    (6, datetime(2010, 9, 19, 0, 0, 0),    datetime(2010, 9, 19, 12, 59, 59)),
    (4, datetime(2010, 9, 19, 10, 30, 17), datetime(2010, 9, 19, 20, 20, 59)),
    (6, datetime(2010, 9, 19, 14, 12, 0),  datetime(2010, 9, 19, 23, 59, 59)),
    (5, datetime(2010, 9, 19, 17, 0, 22),  datetime(2010, 9, 19, 19, 14, 20))
]

cutpoints = SortedSet()

for period in periods:
    cutpoints.add(period[1])
    cutpoints.add(period[2])

ranges = []

start_cutpoint = None
for end_cutpoint in cutpoints:

    if not start_cutpoint:  # skip first
        start_cutpoint = end_cutpoint
        continue

    cut_point_active_periods = []

    for period in periods:

        # check if period and cutpoint range overlap
        start_overlap = max(start_cutpoint, period[1])
        end_overlap = min(end_cutpoint, period[2])

        if start_overlap < end_overlap:
            cut_point_active_periods.append(period[0])

    ranges.append((cut_point_active_periods, start_cutpoint, end_cutpoint))
    start_cutpoint = end_cutpoint

答案 1 :(得分:0)

为每个时间间隔制作两条记录:{id, time, start/end}

按时间比较所有这些记录的列表。如果时间字段相关,则比较开始/结束字段并首先选择结束。

浏览列表。

当您遇到开始记录时,请将ID添加到active list,腾出时间last time

当您遇到结束记录时,输出active list并带有last time标签,然后从活动列表中删除ID。更改last time

让我们有间隔

 A: 0..3
 B: 1..2
 C: 2..4

记录:

 (A,0,s), (A,3,e), (B,1,s), (B,2,e), (C,2,s), (C,4,e)

排序:

 (A,0,s), (B,1,s), (B,2,e), (C,2,s), (A,3,e), (C,4,e)

走分类列表:

  current      active      output         last time
  (A,0,s)      A            -              0           
  (B,1,s)      A,B        A 0..1           1
  (B,2,e)      A          A,B 1..2         2
  (C,2,s)      A,C          -              2
  (A,3,e)      C          A,C  2..3        3
  (C,4,e)      -          C 3..4           4

答案 2 :(得分:0)

这对我来说是一个编程挑战,但我终于设法做到了。基本上,我将所有时间与他们的ID一起排序,然后我运行for循环来获得结果:

from datetime import datetime

timelist = [
    (5, datetime(2010, 9, 19, 0, 0, 0), datetime(2010, 9, 19, 0, 5, 10)),
    (6, datetime(2010, 9, 19, 0, 0, 0), datetime(2010, 9, 19, 12, 59, 59)),
    (4, datetime(2010, 9, 19, 10, 30, 17), datetime(2010, 9, 19, 20, 20, 59)),
    (6, datetime(2010, 9, 19, 14, 12, 0), datetime(2010, 9, 19, 23, 59, 59)),
    (5, datetime(2010, 9, 19, 17, 0, 22), datetime(2010, 9, 19, 19, 14, 20))
]

timelist_new = []
for time in timelist:
    timelist_new.append((time[0], time[1], 'begin'))
    timelist_new.append((time[0], time[2], 'end'))

timelist_new = sorted(timelist_new, key=lambda x: x[1])

key = None
keylist = set()
aggregator = []

for idx in range(len(timelist_new[:-1])):
    t1 = timelist_new[idx]
    t2 = timelist_new[idx + 1]
    t1_key = str(t1[0])
    t2_key = str(t2[0])
    t1_dt = t1[1]
    t2_dt = t2[1]
    t1_pointer = t1[2]
    t2_pointer = t2[2]

    if t1_dt == t2_dt:
        keylist.add(t1_key)
        keylist.add(t2_key)
    elif t1_dt < t2_dt:
        if t1_pointer == 'begin':
            keylist.add(t1_key)
        if t1_pointer == 'end':
            keylist.discard(t1_key)

    key = ','.join(sorted(keylist))
    aggregator.append((key, t1_dt, t2_dt))


for stuff in aggregator:
    print stuff

输出:

('5,6', datetime.datetime(2010, 9, 19, 0, 0), datetime.datetime(2010, 9, 19, 0, 0))
('5,6', datetime.datetime(2010, 9, 19, 0, 0), datetime.datetime(2010, 9, 19, 0, 5, 10))
('6', datetime.datetime(2010, 9, 19, 0, 5, 10), datetime.datetime(2010, 9, 19, 10, 30, 17))
('4,6', datetime.datetime(2010, 9, 19, 10, 30, 17), datetime.datetime(2010, 9, 19, 12, 59, 59))
('4', datetime.datetime(2010, 9, 19, 12, 59, 59), datetime.datetime(2010, 9, 19, 14, 12))
('4,6', datetime.datetime(2010, 9, 19, 14, 12), datetime.datetime(2010, 9, 19, 17, 0, 22))
('4,5,6', datetime.datetime(2010, 9, 19, 17, 0, 22), datetime.datetime(2010, 9, 19, 19, 14, 20))
('4,6', datetime.datetime(2010, 9, 19, 19, 14, 20), datetime.datetime(2010, 9, 19, 20, 20, 59))
('6', datetime.datetime(2010, 9, 19, 20, 20, 59), datetime.datetime(2010, 9, 19, 23, 59, 59))

***Repl Closed***

只删除输出的第一行,因为开始日期和结束日期相同:)

final_list = []
for stuff in aggregator:
    if stuff[1] != stuff[2]:
        final_list.append(stuff)