我有一长串的值x和y,按x值的值排序。我想输出x和y值的最长连续跨度列表。这有点难以言辞,但希望通过以下示例变得清晰:
0, 148
0, 145
0, 186
0, 5768
600, 2374
2376, 2415
3000, 4315
6000, 6616
6000, 6799
6000, 7262
由于5768和6000之间的区域未被任何条目覆盖,因此上述内容应输出:
0, 5768
6000, 7262
在我看来,这应该是一个简单的问题而且我已经在没有解决方案的情况下工作了一段时间。我在下面发布了我的代码。 我目前努力的问题是,当x值被排序时,行k的x值可能超过行k-1的y值,但不标记新连续字符串的开头。
lines = [line.strip('\n') for line in open('test')]
myarray=[]
for line in lines:
myarray.append(line.split(', '))
def findCoveredRegions(regionArray):
resultsContigs = []
j = regionArray[0][1]
i = regionArray[0][0]
for line in regionArray:
last_i = i
i = line[0]
if i <= j:
if line[1] > j:
j = line[1]
else:
resultsContigs.append([last_i,j])
resultsContigs.append([i,regionArray[len(regionArray)-1][1]])
return resultsContigs
print findCoveredRegions(myarray)
答案 0 :(得分:2)
这是一个numpy解决方案
myarray = np.asanyarray(myarray)
order = np.argsort(myarray.ravel())
coverage = np.add.accumulate(1 - 2*(order%2))
gaps = np.where(coverage==0)[0]
left = order[np.r_[0, gaps[:-1] + 1]]
right = order[gaps]
result = myarray.ravel()[np.c_[left, right]]
它汇集并排序所有边界。然后从左到右计算它遇到的左(+1)和右(-1)边界的数量。这个数字永远不会是负数,只有在存在差距时才会降到零。从间隙的位置重建覆盖的间隔。
答案 1 :(得分:2)
这不会特别快,但我认为它非常具有Pythonic和可读性。它不需要或使用间隔的排序列表。
intervals = [(0, 148),
(0, 145),
(0, 186),
(0, 5768),
(600, 2374),
(2376, 2415),
(3000, 4315),
(6000, 6616),
(6000, 6799),
(6000, 7262)]
def intersect(interval_a, interval_b):
"""Return whether two intervals intersect"""
(a_bottom, a_top), (b_bottom, b_top) = interval_a, interval_b
return a_bottom <= b_top and b_bottom <= a_top
def union_one_one(interval_a, interval_b):
"""Return the union of two intervals"""
(a_bottom, a_top), (b_bottom, b_top) = interval_a, interval_b
return min(a_bottom, b_bottom), max(a_top, b_top)
def union_many_one(old_intervals, new_interval):
"""Return the union of a new interval with several old intervals."""
result = []
for old_interval in old_intervals:
# If an old interval intersects with the new interval, merge the old interval into the new one.
if intersect(old_interval, new_interval):
new_interval = union_one_one(old_interval, new_interval)
# Otherwise, leave the old interval alone.
else:
result.append(old_interval)
result.append(new_interval)
return result
def union_all(intervals):
"""Return the union of a collection of intervals"""
result = []
for interval in intervals:
result = union_many_one(result, interval)
return result
print(union_all(intervals))