我试图找到一种更快速的方法来过滤我的范围列表,这样就可以排除任何可以被更大范围完全覆盖的范围。例如,
#all ranges have width >1, which means no such case like xx=[1,1] in my list
#each range itself is sorted. E.g. no such case like [1,3,2]. It is already like [1,2,3]
#each range only contains continuous integers. E.g. no such case like [3,5,7], it will only be like [3,4,5,6,7]. In fact, you could simply consider the first and last integer of the range to know the whole range.
aa=[1,2,3]
bb=[2,3,4]
cc=[1,2]
dd=[0,1,2]
RangeList=[aa,bb,cc,dd]
#FinalList=[aa,bb,dd]
cc可以用aa或dd覆盖(我认为它是一个子集),所以我想排除它。我绝对可以为n ^ 2比较编写一个循环,但我希望有一个更快的方法,因为我有很多这些范围。
答案 0 :(得分:4)
您可以先排序解决此问题:
import operator
ranges=[[0,1,2,3,4], [1,2], [0,1], [2,3,4], [3,4,5], [3,4,5,6], [4,5], [6,7], [5,6]]
sorted_ranges = sorted(ranges,key=operator.itemgetter(-1),reverse=True)
sorted_ranges = sorted(sorted_ranges,key=operator.itemgetter(0))
filtered = []
i,j = 0,0
while i < len(sorted_ranges):
filtered.append(sorted_ranges[i])
j = i+1
while j < len(sorted_ranges) and sorted_ranges[i][-1] >= sorted_ranges[j][-1]:
print "Remove " , sorted_ranges.pop(j) , "dominated by",sorted_ranges[i]
i+=1
print "RESULT",filtered
您需要按升序排序第一个元素,并按最后一个元素的降序排序。 我使用了两个显式的sorted调用,但你可以在一次传递中定义你的cmp函数:
sorted_ranges = sorted(ranges,cmp=lambda x,y: (x[0]-y[0]) if ((x[0]-y[0]) != 0 ) else (y[-1]-x[-1]))
通过这种方式,首先会出现支配范围。 请注意,排序后嵌套的while循环具有复杂度O(n),因为每个元素只检查一次并被删除或添加到最终集合中。 整个算法的复杂性是O(nlogn)
答案 1 :(得分:0)
使用sets
,issubset()
,filter()
:
ranges = [[0,1,2,3,4], [1,2], [0,1], [2,3,4], [3,4,5], [3,4,5,6], [4,5], [6,7], [5,6]]
# Use 'frozenset' as it is hashable to put in a big 'set'
sets = set([frozenset(a) for a in ranges])
def f(x):
for y in sets:
if x == y:
continue
if x.issubset(y):
return False
return True
result = [list(a) for a in filter(f, sets)]
print 'Result=', result
f
函数会过滤输入中找到的任何集合。
Result= [[3, 4, 5, 6], [0, 1, 2, 3, 4], [6, 7]]
虽然没有进行性能测试。
答案 2 :(得分:0)
我的第一个想法是:
compressed = dict()
for lst in sorted(RangeList,reverse=True, key= lambda x: (x[0],x[1])):
key = lst[0]
if key not in compressed:
compressed[key] = lst
print compressed.values()
但正如igon指出的那样,它错过了内部子集。我认为以下内容将解决这个问题:
RangeList = sorted(RangeList,reverse=True, key= lambda x: (-x[0],x[-1]))
lst = RangeList[0]
oldstart = lst[0]
oldend = lst[-1]
compressed = {oldstart: lst}
for lst in RangeList[1:]:
start = lst[0]
end = lst[-1]
if (start not in compressed and oldend < end):
compressed[start] = lst
oldstart, oldend = start, end
print compressed.values()
答案 3 :(得分:0)
这使用set
和issubset
,但首先按大小对列表进行排序,filter
函数以相反的顺序再次遍历输入,尝试优化搜索。这可以改善O()顺序。
ranges = [[0,1,2,3,4], [1,2], [0,1], [2,3,4], [3,4,5], [3,4,5,6], [4,5], [6,7], [5,6]]
sets = [set(a) for a in ranges]
sets.sort(key=len)
reverse_sets = sets[:]
reverse_sets.reverse()
def f(x):
for y in reverse_sets:
if x == y:
continue
if x.issubset(y):
return False
return True
print 'Result=', [list(a) for a in filter(f, sets)]
结果:
Result= [[6, 7], [3, 4, 5, 6], [0, 1, 2, 3, 4]]