我有一个元组列表,其中每个元组都是(start-time, end-time)
。我正在尝试合并所有重叠的时间范围并返回不同时间范围的列表。
例如
[(1, 5), (2, 4), (3, 6)] ---> [(1,6)]
[(1, 3), (2, 4), (5, 8)] ---> [(1, 4), (5,8)]
以下是我实施它的方式。
# Algorithm
# initialranges: [(a,b), (c,d), (e,f), ...]
# First we sort each tuple then whole list.
# This will ensure that a<b, c<d, e<f ... and a < c < e ...
# BUT the order of b, d, f ... is still random
# Now we have only 3 possibilities
#================================================
# b<c<d: a-------b Ans: [(a,b),(c,d)]
# c---d
# c<=b<d: a-------b Ans: [(a,d)]
# c---d
# c<d<b: a-------b Ans: [(a,b)]
# c---d
#================================================
def mergeoverlapping(initialranges):
i = sorted(set([tuple(sorted(x)) for x in initialranges]))
# initialize final ranges to [(a,b)]
f = [i[0]]
for c, d in i[1:]:
a, b = f[-1]
if c<=b<d:
f[-1] = a, d
elif b<c<d:
f.append((c,d))
else:
# else case included for clarity. Since
# we already sorted the tuples and the list
# only remaining possibility is c<d<b
# in which case we can silently pass
pass
return f
我想知道是否
感谢您的帮助。谢谢!
答案 0 :(得分:14)
提高效率的几种方法,Pythonic:
set()
构造,因为算法应在主循环期间删除重复项。 yield
生成值。tuple()
调用移动到产生最终值的位置,这样就不必构造和丢弃额外的元组,并重用列表{{1}用于存储当前时间范围以进行比较。代码:
saved
答案 1 :(得分:2)
排序元组然后列出,如果t1.right&gt; = t2.left =&gt;合并 并使用新列表重新启动,...
def f(l, sort = True):
if sort:
sl = sorted(tuple(sorted(i)) for i in l)
else:
sl = l
if len(sl) > 1:
if sl[0][1] >= sl[1][0]:
sl[0] = (sl[0][0], sl[1][1])
del sl[1]
if len(sl) < len(l):
return f(sl, False)
return sl
答案 2 :(得分:1)
排序部分:使用标准排序,它已经以正确的方式比较元组。
sorted_tuples = sorted(initial_ranges)
合并部分。它也消除了重复范围,因此不需要set
。假设您有current_tuple
和next_tuple
。
c_start, c_end = current_tuple
n_start, n_end = next_tuple
if n_start <= c_end:
merged_tuple = min(c_start, n_start), max(c_end, n_end)
我希望逻辑足够清楚。
要查看下一个元组,您可以使用对sorted tuples
的索引访问;无论如何,这是一个完全已知的序列。
答案 3 :(得分:1)
对所有边界进行排序,然后选择边界结束后面跟着边界开始的所有对。
def mergeOverlapping(initialranges):
def allBoundaries():
for r in initialranges:
yield r[0], True
yield r[1], False
def getBoundaries(boundaries):
yield boundaries[0][0]
for i in range(1, len(boundaries) - 1):
if not boundaries[i][1] and boundaries[i + 1][1]:
yield boundaries[i][0]
yield boundaries[i + 1][0]
yield boundaries[-1][0]
return getBoundaries(sorted(allBoundaries()))
嗯,不是那么漂亮,但写起来至少很有趣!
编辑:多年后,在一次upvote之后,我意识到我的代码错了!这是新版本,只是为了好玩:
def mergeOverlapping(initialRanges):
def allBoundaries():
for r in initialRanges:
yield r[0], -1
yield r[1], 1
def getBoundaries(boundaries):
openrange = 0
for value, boundary in boundaries:
if not openrange:
yield value
openrange += boundary
if not openrange:
yield value
def outputAsRanges(b):
while b:
yield (b.next(), b.next())
return outputAsRanges(getBoundaries(sorted(allBoundaries())))
基本上我用-1或1标记边界,然后按值对它们进行排序,只有当开括号和闭括号之间的平衡为零时才输出它们。
答案 4 :(得分:1)
迟到了,但可能会帮助有人找这个。我有一个类似的问题,但有词典。给定一个时间范围列表,我想找到重叠并在可能的情况下合并它们。对@samplebias的一点修改回答了我:
合并功能:
File directory = new File("\\your_path");
File[] contents = directory.listFiles();
for (File f : contents) {
System.out.println(f.getName());
}
数据:强>
def merge_range(ranges: list, start_key: str, end_key: str):
ranges = sorted(ranges, key=lambda x: x[start_key])
saved = dict(ranges[0])
for range_set in sorted(ranges, key=lambda x: x[start_key]):
if range_set[start_key] <= saved[end_key]:
saved[end_key] = max(saved[end_key], range_set[end_key])
else:
yield dict(saved)
saved[start_key] = range_set[start_key]
saved[end_key] = range_set[end_key]
yield dict(saved)
<强>执行:强>
data = [
{'start_time': '09:00:00', 'end_time': '11:30:00'},
{'start_time': '15:00:00', 'end_time': '15:30:00'},
{'start_time': '11:00:00', 'end_time': '14:30:00'},
{'start_time': '09:30:00', 'end_time': '14:00:00'}
]
<强>输出:强>
print(list(merge_range(ranges=data, start_key='start_time', end_key='end_time')))