假设我有一个包含索引[[start, end], [start1, end1], [start2, end2]]
的列表列表。
例如:
[[0, 133], [78, 100], [25, 30]]
。
如何检查列表之间的重叠并删除每次更长的列表? 所以:
>>> list = [[0, 133], [78, 100], [25, 30]]
>>> foo(list)
[[78, 100], [25, 30]]
这是我到目前为止所做的:
def cleanup_list(list):
i = 0
c = 0
x = list[:]
end = len(x)
while i < end-1:
for n in range(x[i][0], x[i][1]):
if n in range(x[i+1][0], x[i+1][1]):
list.remove(max(x[i], x[i+1]))
i +=1
return list
但除了令人费解之外,它还没有正常运作:
>>>cleanup_list([[0,100],[9,10],[12,90]])
[[0, 100], [12, 90]]
任何帮助将不胜感激!
编辑:
其他例子如下:
>>>a = [[0, 100], [4, 20], [30, 35], [30, 78]]
>>>foo(a)
[[4, 20], [30, 35]]
>>>b = [[30, 70], [25, 40]]
>>>foo(b)
[[25, 40]]
我基本上试图删除与另一个列表重叠的所有最长列表。在这种情况下,我不必担心列表长度相等。
谢谢!
答案 0 :(得分:10)
要从列表中删除最少数量的间隔,以使剩余的间隔不重叠,则存在O(n*log n)
算法:
def maximize_nonoverlapping_count(intervals):
# sort by the end-point
L = sorted(intervals, key=lambda (start, end): (end, (end - start)),
reverse=True) # O(n*logn)
iv = build_interval_tree(intervals) # O(n*log n)
result = []
while L: # until there are intervals left to consider
# pop the interval with the smallest end-point, keep it in the result
result.append(L.pop()) # O(1)
# remove intervals that overlap with the popped interval
overlapping_intervals = iv.pop(result[-1]) # O(log n + m)
remove(overlapping_intervals, from_=L)
return result
它应该产生以下结果:
f = maximize_nonoverlapping_count
assert f([[0, 133], [78, 100], [25, 30]]) == [[25, 30], [78, 100]]
assert f([[0,100],[9,10],[12,90]]) == [[9,10], [12, 90]]
assert f([[0, 100], [4, 20], [30, 35], [30, 78]]) == [[4, 20], [30, 35]]
assert f([[30, 70], [25, 40]]) == [[25, 40]]
它需要能够在O(log n + m)
时间内找到与给定间隔重叠的所有间隔的数据结构,例如IntervalTree
。有些实现可以在Python中使用,例如quicksect.py
,请参阅Fast interval intersection methodologies以获取示例用法。
以上是基于quicksect
的{{1}}上述算法的实现:
O(n**2)
注意:对于此实现,最坏情况下的时间复杂度为from quicksect import IntervalNode
class Interval(object):
def __init__(self, start, end):
self.start = start
self.end = end
self.removed = False
def maximize_nonoverlapping_count(intervals):
intervals = [Interval(start, end) for start, end in intervals]
# sort by the end-point
intervals.sort(key=lambda x: (x.end, (x.end - x.start))) # O(n*log n)
tree = build_interval_tree(intervals) # O(n*log n)
result = []
for smallest in intervals: # O(n) (without the loop body)
# pop the interval with the smallest end-point, keep it in the result
if smallest.removed:
continue # skip removed nodes
smallest.removed = True
result.append([smallest.start, smallest.end]) # O(1)
# remove (mark) intervals that overlap with the popped interval
tree.intersect(smallest.start, smallest.end, # O(log n + m)
lambda x: setattr(x.other, 'removed', True))
return result
def build_interval_tree(intervals):
root = IntervalNode(intervals[0].start, intervals[0].end,
other=intervals[0])
return reduce(lambda tree, x: tree.insert(x.start, x.end, other=x),
intervals[1:], root)
,因为区间仅标记为已移除,例如,想象O(n**2)
intervals
和len(result) == len(intervals) / 3
的输入len(intervals) / 2
1}}跨越整个范围的时间间隔tree.intersect()
将被称为n/3
次,每次调用将执行x.other.removed = True
至少n/2
次,即n*n/6
次操作总计:
n = 6
intervals = [[0, 100], [0, 100], [0, 100], [0, 10], [10, 20], [15, 40]])
result = [[0, 10], [10, 20]]
以下是基于banyan
的O(n log n)
实施:
from banyan import SortedSet, OverlappingIntervalsUpdator # pip install banyan
def maximize_nonoverlapping_count(intervals):
# sort by the end-point O(n log n)
sorted_intervals = SortedSet(intervals,
key=lambda (start, end): (end, (end - start)))
# build "interval" tree O(n log n)
tree = SortedSet(intervals, updator=OverlappingIntervalsUpdator)
result = []
while sorted_intervals: # until there are intervals left to consider
# pop the interval with the smallest end-point, keep it in the result
result.append(sorted_intervals.pop()) # O(log n)
# remove intervals that overlap with the popped interval
overlapping_intervals = tree.overlap(result[-1]) # O(m log n)
tree -= overlapping_intervals # O(m log n)
sorted_intervals -= overlapping_intervals # O(m log n)
return result
注意:此实现认为[0, 10]
和[10, 20]
间隔重叠:
f = maximize_nonoverlapping_count
assert f([[0, 100], [0, 10], [11, 20], [15, 40]]) == [[0, 10] ,[11, 20]]
assert f([[0, 100], [0, 10], [10, 20], [15, 40]]) == [[0, 10] ,[15, 40]]
sorted_intervals
和tree
可以合并:
from banyan import SortedSet, OverlappingIntervalsUpdator # pip install banyan
def maximize_nonoverlapping_count(intervals):
# build "interval" tree sorted by the end-point O(n log n)
tree = SortedSet(intervals, key=lambda (start, end): (end, (end - start)),
updator=OverlappingIntervalsUpdator)
result = []
while tree: # until there are intervals left to consider
# pop the interval with the smallest end-point, keep it in the result
result.append(tree.pop()) # O(log n)
# remove intervals that overlap with the popped interval
overlapping_intervals = tree.overlap(result[-1]) # O(m log n)
tree -= overlapping_intervals # O(m log n)
return result
答案 1 :(得分:3)
这可能不是最快的解决方案,但我认为真的很冗长 -
a = [[2,100], [4,10], [77,99], [38,39], [44,80], [69,70], [88, 90]]
# build ranges first
def expand(list):
newList = []
for r in list:
newList.append(range(r[0], r[1] + 1))
return newList
def compare(list):
toBeDeleted = []
for index1 in range(len(list)):
for index2 in range(len(list)):
if index1 == index2:
# we dont want to compare ourselfs
continue
matches = [x for x in list[index1] if x in list[index2]]
if len(matches) != 0: # do we have overlap?
## compare lengths and get rid of the longer one
if len(list[index1]) > len(list[index2]):
toBeDeleted.append(index1)
break
elif len(list[index1]) < len(list[index2]):
toBeDeleted.append(index2)
# distinct
toBeDeleted = [ toBeDeleted[i] for i,x in enumerate(toBeDeleted) if x not in toBeDeleted[i+1:]]
print len(list)
# remove items
for i in toBeDeleted[::-1]:
del list[i]
return list
print(compare(expand(a)))
答案 2 :(得分:2)
我认为代码中的一个问题是它无法处理一个列表包含另一个列表的情况。例如,[0,100]
包含[9,10]
。当您在[0,100]中循环n并且n进入[9,10]时,将触发条件语句if n in range(x[i+1][0], x[i+1][1])
。然后内置函数max
将比较[0, 100]
和[9, 10]
,并且不幸max
将返回[9,10]
,因为它会比较列表中的第一个数字。因此,您删除了错误的元素。
我正在尝试另一种方式来达到你想要的效果。我创建了一个新列表,而不是操纵列表本身。如果符合我们的要求,有条件地为其添加新元素。
def cleanup_list(lists):
ranges = []
for l in lists:
to_insert = True
for i in ranges:
r = range(i[0],i[1])
# if l overlaps with i, but l does not contain i
if l[0] in r or l[1] in r:
if (l[1]-l[0]) < len(r):
ranges.remove(i)
else:
to_insert = False
# l contains i
if l[0]<i[0] and l[1]>i[1]:
to_insert = False
if to_insert:
ranges.append(l)
return ranges
答案 3 :(得分:1)
按长度对所有项目进行升序排序。
将它们添加到细分树中并忽略重叠的项目。
答案 4 :(得分:1)
通常情况下,如果出现以下情况,则会有两个间隔重叠:
min([upperBoundOfA, upperBoundOfB]) >= max([lowerBoundOfA, lowerBoundOfB])
如果是这种情况,那些间隔的并集是:
(min([lowerBoundOfA, lowerBoundOfB]), max([upperBoundOfA, upperBoundOfB])
同样,这些间隔的交集将是:
(min([upperBoundOfA, upperBoundOfB]), max([lowerBoundOfA, lowerBoundOfB]))