合并具有相关性的重叠区间对象

时间:2019-05-09 16:15:11

标签: python python-3.x

我需要合并间隔对象,以根据额外的参数获得不同的间隔范围。最好的方法是怎么做到的?

这是关于在给定小时状态下是否为真的明确声明。返回的列表必须具有不可重复的时间间隔。

间隔对象说明:

{
    'startDate': datetime.datetime, # start of interval
    'endDate': datetime.datetime, # end of interval
    'prioritized': bool # if True - it's always important, override no-prioritized intervals
    'state': bool # result of interval
}

在下面的示例中,我将startDate / endDate更改为字符串,以使其看起来更好。

间隔列表如下:

interval_list = [
    {'startDate': '10:00:00', 'endDate': '12:00:00', 'prioritized': False, 'state': False},
    {'startDate': '11:00:00', 'endDate': '18:00:00', 'prioritized': True, 'state': True},
    {'startDate': '13:00:00', 'endDate': '17:00:00', 'prioritized': False, 'state': False},
    {'startDate': '17:00:00', 'endDate': '20:00:00', 'prioritized': False, 'state': True},
    {'startDate': '19:30:00', 'endDate': '19:45:00', 'prioritized': True, 'state': False}
]

我正在努力实现以下目标:

merge(interval_list)应该返回:

[
    {'startDate': '10:00:00', 'endDate': '11:00:00', 'state': False},
    {'startDate': '11:00:00', 'endDate': '19:30:00', 'state': True},
    {'startDate': '19:30:00', 'endDate': '19:45:00', 'state': False},
    {'startDate': '19:45:00', 'endDate': '20:00:00', 'state': True},
]

我现在有以下未完成的代码:

def merge_range(ranges: list):
    ranges = sorted(ranges, key=lambda x: x['startDate'])
    last_interval = dict(ranges[0])

    for current_interval in sorted(ranges, key=lambda x: x['startDate']):
        if current_interval['startDate'] > last_interval['endDate']:
            yield dict(last_interval)
            last_interval['startDate'] = current_interval['startDate']
            last_interval['endDate'] = current_interval['endDate']
            last_interval['prioritized'] = current_interval['prioritized']
            last_interval['state'] = current_interval['state']
        else:
            if current_interval['state'] == last_interval['state']:
                last_interval['endDate'] = max(last_interval['endDate'], current_interval['endDate'])
            else:
                pass # i stopped here

    yield dict(last_interval)

并通过merged_interval_list = list(merge_range(interval_list))

使用它

这是个好方法吗?

2 个答案:

答案 0 :(得分:1)

我得到了这个问题的答案:

首先,我将事件分为优先列表和非优先列表。

根据优先级列表,创建给定日期的时间间隔取反。

接下来,我将优先列表设置为主列表,并开始遍历非优先列表。

import datetime
from pprint import pprint

df = "%Y-%m-%d %H:%M:%S"
ds = "%Y-%m-%d"

events = {}
prioritized_events = {}

events["2019-05-10"] = [{
    'startDate': datetime.datetime.strptime("2019-05-10 01:00:00", df),
    'endDate': datetime.datetime.strptime("2019-05-10 02:00:00", df),
    'state': True
}, {
    'startDate': datetime.datetime.strptime("2019-05-10 10:00:00", df),
    'endDate': datetime.datetime.strptime("2019-05-10 12:00:00", df),
    'state': False
}, {
    'startDate': datetime.datetime.strptime("2019-05-10 13:00:00", df),
    'endDate': datetime.datetime.strptime("2019-05-10 17:00:00", df),
    'state': False
}, {
    'startDate': datetime.datetime.strptime("2019-05-10 17:00:00", df),
    'endDate': datetime.datetime.strptime("2019-05-10 20:00:00", df),
    'state': True
}]

prioritized_events["2019-05-10"] = [{
    'startDate': datetime.datetime.strptime("2019-05-10 11:00:00", df),
    'endDate': datetime.datetime.strptime("2019-05-10 18:00:00", df),
    'state': True
}, {
    'startDate': datetime.datetime.strptime("2019-05-10 19:30:00", df),
    'endDate': datetime.datetime.strptime("2019-05-10 20:00:00", df),
    'state': False
}]

allowed_intervals = []
for event_date in prioritized_events:
    minimal_time = datetime.datetime.combine(datetime.datetime.strptime(event_date, ds), datetime.time.min)
    maximum_time = datetime.datetime.combine(datetime.datetime.strptime(event_date, ds), datetime.time.max)

    for ev in prioritized_events[event_date]:
        if ev['startDate'] != minimal_time:
            allowed_intervals.append({
                'startDate': minimal_time,
                'endDate': ev['startDate']
            })
            minimal_time = ev['endDate']

    if prioritized_events[event_date][len(prioritized_events[event_date]) - 1]['endDate'] != maximum_time:
        allowed_intervals.append({
            'startDate': prioritized_events[event_date][len(prioritized_events[event_date]) - 1]['endDate'],
            'endDate': maximum_time
        })

for event_date in events:
    if event_date not in prioritized_events:
        prioritized_events[event_date] = events[event_date]
    else:
        for ev in events[event_date]:
            start = ev['startDate']
            end = ev['endDate']
            state = ev['state']
            done = False
            for allowed_interval in allowed_intervals:
                if start >= allowed_interval['startDate'] and end <= allowed_interval['endDate']:
                    prioritized_events[event_date].append({
                        'startDate': start,
                        'endDate': end,
                        'state': state
                    })
                    done = True
                    break
                elif allowed_interval['startDate'] <= start < allowed_interval['endDate'] < end:
                    prioritized_events[event_date].append({
                        'startDate': start,
                        'endDate': allowed_interval['endDate'],
                        'state': state
                    })
                    start = allowed_interval['endDate']
                elif start < allowed_interval['startDate'] and start < allowed_interval['endDate'] < end:
                    prioritized_events[event_date].append({
                        'startDate': allowed_interval['startDate'],
                        'endDate': allowed_interval['endDate'],
                        'state': state
                    })
                    start = allowed_interval['endDate']
                elif start < allowed_interval['startDate'] and start < allowed_interval['endDate'] and allowed_interval['startDate'] < end <= allowed_interval['endDate']:
                    prioritized_events[event_date].append({
                        'startDate': allowed_interval['startDate'],
                        'endDate': end,
                        'state': state
                    })
                    start = end
            if done:
                continue

    prioritized_events[event_date] = sorted(prioritized_events[event_date], key=lambda k: k['startDate'])

现在排序列表:

pprint(prioritized_events["2019-05-10"])

返回:

[
 {'startDate': datetime.datetime(2019, 5, 10, 1, 0),
  'endDate': datetime.datetime(2019, 5, 10, 2, 0),
  'state': True
 },
 {'startDate': datetime.datetime(2019, 5, 10, 10, 0),
  'endDate': datetime.datetime(2019, 5, 10, 11, 0),
  'state': False
 },
 {'startDate': datetime.datetime(2019, 5, 10, 11, 0),
  'endDate': datetime.datetime(2019, 5, 10, 18, 0),
  'state': True
 },
 {'startDate': datetime.datetime(2019, 5, 10, 18, 0),
  'endDate': datetime.datetime(2019, 5, 10, 19, 30),
  'state': True
 },
 {'startDate': datetime.datetime(2019, 5, 10, 19, 30),
  'endDate': datetime.datetime(2019, 5, 10, 20, 0),
  'state': False
 }
]

答案 1 :(得分:0)

当我们处理时间间隔时,主要思想是对日期(开始和结束)及其状态进行排序:startend。在这里,我们还需要访问原始间隔,以处理优先级和状态。

让我们尝试以下列表:

interval_list = [
    {'startDate': '10:00:00', 'endDate': '12:00:00', 'prioritized': False, 'state': False},
    {'startDate': '11:00:00', 'endDate': '18:00:00', 'prioritized': True, 'state': True},
    {'startDate': '13:00:00', 'endDate': '17:00:00', 'prioritized': False, 'state': False},
    {'startDate': '17:00:00', 'endDate': '20:00:00', 'prioritized': False, 'state': True},
    {'startDate': '19:30:00', 'endDate': '19:45:00', 'prioritized': True, 'state': False}
]

首先,我们将日期字符串转换为日期(就像您所做的那样):

import datetime

day = '2019-05-10'
def get_datetime(d, t):
    return datetime.datetime.strptime(d+" "+t, "%Y-%m-%d %H:%M:%S")

for interval in interval_list:
    interval['startDate'] = get_datetime(day, interval['startDate'])
    interval['endDate'] =  get_datetime(day, interval['endDate'])

现在,我们用所需的信息构建一个新列表:

L = sorted(
    [(interval['startDate'], 1, i) for i, interval in enumerate(interval_list)]
    +[(interval['endDate'], -1, i) for i, interval in enumerate(interval_list)]
)

L是元组(date, dir, index)的以下列表(dir:1表示它是开始日期,-1表示它是结束日期):

[(datetime.datetime(2019, 5, 10, 10, 0), 1, 0), (datetime.datetime(2019, 5, 10, 11, 0), 1, 1), (datetime.datetime(2019, 5, 10, 12, 0), -1, 0), (datetime.datetime(2019, 5, 10, 13, 0), 1, 2), (datetime.datetime(2019, 5, 10, 17, 0), -1, 2), (datetime.datetime(2019, 5, 10, 17, 0), 1, 3), (datetime.datetime(2019, 5, 10, 18, 0), -1, 1), (datetime.datetime(2019, 5, 10, 19, 30), 1, 4), (datetime.datetime(2019, 5, 10, 19, 45), -1, 4), (datetime.datetime(2019, 5, 10, 20, 0), -1, 3)]

现在,我们可以遍历L并跟踪当前状态和根据给定的优先级修改状态时产生日期的优先级:

def interval_info(i):
    interval = interval_list[i]
    return interval['state'], interval['prioritized']

T = []
stack = []
for boundary_date, direction, i in L:
    state, prioritized = interval_info(i) # state and priority of the current date
    if direction == 1: # start date
        if stack:
            prev_state, prev_prioritized = interval_info(stack[-1]) # previous infos
            if state != prev_state and prioritized >= prev_prioritized: # enter a new state with a greater or equal priority
                T.append((boundary_date, state)) # enter in new state
        else: # begin of covered area
            T.append((boundary_date, state)) # enter in new state
        stack.append(i) # add the opened interval
    elif direction == -1: # end date
        stack.remove(i) # remove the closed interval (i is a *value* in stack)
        if stack:
            prev_state, prev_prioritized = interval_info(stack[-1])
            if state != prev_state and not prev_prioritized: # leave a non priority state
                T.append((boundary_date, prev_state)) # re-enter in prev state
        else: # end of covered area
            T.append((boundary_date, None)) # enter in None state

T的值为:

[(datetime.datetime(2019, 5, 10, 10, 0), False), (datetime.datetime(2019, 5, 10, 11, 0), True), (datetime.datetime(2019, 5, 10, 19, 30), False), (datetime.datetime(2019, 5, 10, 19, 45), True), (datetime.datetime(2019, 5, 10, 20, 0), None)]

您可以轻松地生成所需的输出。希望对您有帮助!

编辑:奖励:如何将开始日期转换为时间间隔:

>>> import datetime
>>> T = [(datetime.datetime(2019, 5, 10, 10, 0), False), (datetime.datetime(2019, 5, 10, 11, 0), True), (datetime.datetime(2019, 5, 10, 19, 30), False), (datetime.datetime(2019, 5, 10, 19, 45), True), (datetime.datetime(2019, 5, 10, 20, 0), None)]
>>> [{'startDate': s[0], 'endDate': e[0], 'state': s[1]} for s,e in zip(T, T[1:])]
[{'startDate': datetime.datetime(2019, 5, 10, 10, 0), 'endDate': datetime.datetime(2019, 5, 10, 11, 0), 'state': False}, {'startDate': datetime.datetime(2019, 5, 10, 11, 0), 'endDate': datetime.datetime(2019, 5, 10, 19, 30), 'state': True}, {'startDate': datetime.datetime(2019, 5, 10, 19, 30), 'endDate': datetime.datetime(2019, 5, 10, 19, 45), 'state': False}, {'startDate': datetime.datetime(2019, 5, 10, 19, 45), 'endDate': datetime.datetime(2019, 5, 10, 20, 0), 'state': True}]

您只需将每个开始日期与下一个日期压缩在一起,以获取时间间隔。