我有以下结构的列表:
列表的结构:[(id,start, end), (id,start, end), (id,start, end)]
例如,它们可能如下所示:
List1 = [(1,50,56),(1,61,69),(1,70,87),(1,90,99),(1,110,117),(1,119,126),(2,3,9), (2,11,17), (3,2,9)]
List2 = [(1,44,56),(1,59,64),(1,70,81),(1,84,90),(1,99,155), (2,5,15), (3,3,9)]
我需要找到它们之间的重叠区域。
我尝试使用以下代码进行暴力破解方法:
for a1, s1, e1 in List1:
for a2, s2, e2 in List2:
sgroup = [s1, s2]
egroup = [e1, e2]
mstart = max(sgroup)
mend = min(egroup)
if a1 == a2 and e2>=s1 and s2<=e1:
t = (mstart, mend)
print(t)
有人可以帮助我加快速度吗?我需要一种算法来比这种暴力方法更快地工作。
答案 0 :(得分:1)
snippet 2
for a1, s1, e1 in List1:
for a2, s2, e2 in List2:
if a1 == a2 and s2 <= e1 and e2 >= s1:
print (max(s1, s2), min(e1, e2))
import time
def group1():
res = []
for a1, s1, e1 in List1:
for a2, s2, e2 in List2:
sgroup = [s1, s2]
egroup = [e1, e2]
mstart = max(sgroup)
mend = min(egroup)
if a1 == a2 and e2>=s1 and s2<=e1:
t = (mstart, mend)
res.append(t)
return res
def group2():
res = []
for a1, s1, e1 in List1:
for a2, s2, e2 in List2:
if a1 == a2 and s2 <= e1 and e2 >= s1:
res.append((max(s1, s2), min(e1, e2)))
return res
List1 = [(1,50,56),(1,61,69),(1,70,87),(1,90,99),(1,110,117),(1,119,126),(2,3,9), (2,11,17), (3,2,9)]
List2 = [(1,44,56),(1,59,64),(1,70,81),(1,84,90),(1,99,155), (2,5,15), (3,3,9)]
for func in [group1, group2]:
start = time.time()
func()
end = time.time()
print(f'{func.__name__}: {end - start}')
print(func())
答案 1 :(得分:0)
看来,您已经可以保证列表按ID和开始时间排序,并且没有重叠。本质上,每个ID都是用于重叠检测的单独列表。
您将像完成列表合并(本质上是这样)一样,浏览列表对。在每个列表中都有一个索引(index_a,index_b);在每次迭代中,您将使用两个引用值中的较小者(位于索引处的起始值)。
要处理一个间隔-list_a [index_a]中的元素:
check_b
。答案 2 :(得分:0)
如果您想使用列表理解功能,可以这样做:
List1 = [(1,50,56),(1,61,69),(1,70,87),(1,90,99),(1,110,117),(1,119,126),(2,3,9), (2,11,17), (3,2,9)]
List2 = [(1,44,56),(1,59,64),(1,70,81),(1,84,90),(1,99,155), (2,5,15), (3,3,9)]
output = [(max(s1, s2), min(e1, e2)) for id1, s1, e1 in List1
for id2, s2, e2 in List2
if id1 == id2 and e2 >= s1 and s2 <= e1]
print(output)
输出为:
[(50, 56), (61, 64), (70, 81), (84, 87), (90, 90), (99, 99), (110, 117), (119, 126), (5, 9), (11, 15), (3, 9)]
似乎与原始答案相同。通常,列表理解将比标准循环快。
列表理解: 6.51 µs :每个循环±13.7 ns(平均±标准偏差,共运行7次,每个循环100000次)
原始循环: 25.2 µs :每个循环±239 ns(平均±标准偏差,共运行7次,每个循环10000次)
(“原始循环”的比较代码为:
output = []
for a1, s1, e1 in List1:
for a2, s2, e2 in List2:
sgroup = [s1, s2]
egroup = [e1, e2]
mstart = max(sgroup)
mend = min(egroup)
if a1 == a2 and e2>=s1 and s2<=e1:
output.append((mstart, mend))
)
答案 3 :(得分:0)
好吧……这是极端的矫kill过正(我很开心,别管我!),但是它的运行速度比简单算法快约三倍。
这种方法的真正优势在于它可以处理非常大的列表而不会减慢速度,并且不会在内存中保留任何内容。因此,尝试使用其中包含100或1000项内容的列表,您应该会看到更大的改进。
我不假设列表已排序,因此假设Python的排序算法良好,时间应以O(n.log(n))
排序为主。
from itertools import groupby
from operator import itemgetter
def get_list_overlaps(list_a, list_b):
for range_id, (a_ranges, b_ranges) in align_lists(list_a, list_b):
a_range = next(a_ranges)
b_range = next(b_ranges)
try:
while a_range and b_range:
overlap = get_overlap(a_range, b_range)
if overlap:
yield overlap
# If we overlap, discard the one which ends earliest
if a_range[2] < b_range[2]:
a_range = next(a_ranges)
else:
b_range = next(b_ranges)
else:
# If not, discard the one which starts earliest
if a_range[1] < b_range[1]:
a_range = next(a_ranges)
else:
b_range = next(b_ranges)
except StopIteration:
continue
def align_lists(list_a, list_b):
b_grouped = groupby(sorted(list_b), key=itemgetter(0))
b_id, b_intervals = next(b_grouped)
for a_id, a_intervals in groupby(sorted(list_a), key=itemgetter(0)):
# Work until our lists line up
if a_id < b_id:
continue
try:
while a_id > b_id:
b_id, b_intervals = next(b_grouped)
except StopIteration:
break
yield a_id, (a_intervals, b_intervals)
def get_overlap(a_range, b_range):
_, a_start, a_end = a_range
_, b_start, b_end = b_range
# If either ends before the other starts, no overlap
if b_end < a_start or a_end < b_start:
return
return max(a_start, b_start), min(a_end, b_end)
# -------------------------------------------------------------------- #
List1 = [(1, 50, 56), (1, 61, 69), (1, 70, 87), (1, 90, 99), (1, 110, 117),
(1, 119, 126), (2, 3, 9), (2, 11, 17), (3, 2, 9)]
List2 = [(1, 44, 56), (1, 59, 64), (1, 70, 81), (1, 84, 90), (1, 99, 155),
(2, 5, 15), (3, 3, 9)]
for overlap in get_list_overlaps(List1, List2):
print(overlap)
我们的想法是先对ID进行分组,因此我们再也不必费心比较具有不同ID的事物,然后再遍历每个ID中的事物,在做和不重叠时将其扔掉。
您可能可以通过内联某些功能等来优化此功能。
答案 4 :(得分:0)
这里有几点注意事项:
范围不能重叠。这意味着我们可以根据ID对问题进行分区,然后忽略ID分别解决每个分区。
如果对列表进行排序,则无需对照所有范围检查所有范围,而是可以在看到e1
您在这里寻找的是一种滑动窗口算法,与您在TCP中看到的不太相似。
因此,将它们放在一起:
from itertools import groupby
List1 = [(1,50,56),(1,61,69),(1,70,87),(1,90,99),(1,110,117),(1,119,126),(2,3,9), (2,11,17), (3,2,9)]
List2 = [(1,44,56),(1,59,64),(1,70,81),(1,84,90),(1,99,155), (2,5,15), (3,3,9)]
def partition(lst):
part = groupby(lst, lambda el: el[0])
return {id: list(el) for id, el in part}
# you may be able to skip sorted() here if you know your input is already sorted
List1_part = partition(sorted(List1))
List2_part = partition(sorted(List2))
for id in set(List1_part) & set(List2_part):
window_size = max((e-s) for _, s, e in List2_part[id])
window = 0
for r1 in List1_part[id]:
for r2 in List2_part[id][window:]:
_, s1, e1 = r1
_, s2, e2 = r2
if e1 < s2:
break
elif e2 >= s1:
print(id, max(s1, s2), min(e1, e2))
elif s2 + window_size < s1:
window += 1