过滤范围内的整数列表,以排除python

时间:2015-04-23 21:16:20

标签: python range subset

我试图找到一种更快速的方法来过滤我的范围列表,这样就可以排除任何可以被更大范围完全覆盖的范围。例如,

#all ranges have width >1, which means no such case like xx=[1,1] in my list
#each range itself is sorted. E.g. no such case like [1,3,2]. It is already like [1,2,3]
#each range only contains continuous integers. E.g. no such case like [3,5,7], it will only be like [3,4,5,6,7]. In fact, you could simply consider the first and last integer of the range to know the whole range.
aa=[1,2,3]
bb=[2,3,4]
cc=[1,2]
dd=[0,1,2]

RangeList=[aa,bb,cc,dd]

#FinalList=[aa,bb,dd]

cc可以用aa或dd覆盖(我认为它是一个子集),所以我想排除它。我绝对可以为n ^ 2比较编写一个循环,但我希望有一个更快的方法,因为我有很多这些范围。

4 个答案:

答案 0 :(得分:4)

您可以先排序解决此问题:

import operator
ranges=[[0,1,2,3,4], [1,2], [0,1], [2,3,4], [3,4,5], [3,4,5,6], [4,5], [6,7], [5,6]]
sorted_ranges = sorted(ranges,key=operator.itemgetter(-1),reverse=True)
sorted_ranges = sorted(sorted_ranges,key=operator.itemgetter(0))

filtered = []
i,j = 0,0    

while i < len(sorted_ranges):
    filtered.append(sorted_ranges[i])
    j = i+1
    while j < len(sorted_ranges) and sorted_ranges[i][-1] >= sorted_ranges[j][-1]:
        print "Remove " , sorted_ranges.pop(j) , "dominated by",sorted_ranges[i]
    i+=1

print "RESULT",filtered

您需要按升序排序第一个元素,并按最后一个元素的降序排序。 我使用了两个显式的sorted调用,但你可以在一次传递中定义你的cmp函数:

sorted_ranges = sorted(ranges,cmp=lambda x,y: (x[0]-y[0]) if ((x[0]-y[0]) != 0 ) else (y[-1]-x[-1]))

通过这种方式,首先会出现支配范围。 请注意,排序后嵌套的while循环具有复杂度O(n),因为每个元素只检查一次并被删除或添加到最终集合中。 整个算法的复杂性是O(nlogn)

答案 1 :(得分:0)

使用setsissubset()filter()

ranges = [[0,1,2,3,4], [1,2], [0,1], [2,3,4], [3,4,5], [3,4,5,6], [4,5], [6,7], [5,6]]

# Use 'frozenset' as it is hashable to put in a big 'set'
sets = set([frozenset(a) for a in ranges])

def f(x): 
    for y in sets: 
        if x == y:
            continue
        if x.issubset(y):
            return False
    return True

result = [list(a) for a in filter(f, sets)]
print 'Result=', result

f函数会过滤输入中找到的任何集合。

Result= [[3, 4, 5, 6], [0, 1, 2, 3, 4], [6, 7]]

虽然没有进行性能测试。

答案 2 :(得分:0)

我的第一个想法是:

compressed = dict()
for lst in sorted(RangeList,reverse=True, key= lambda x: (x[0],x[1])):
    key = lst[0]
    if key not in compressed:
        compressed[key] = lst
print compressed.values()

但正如igon指出的那样,它错过了内部子集。我认为以下内容将解决这个问题:

RangeList = sorted(RangeList,reverse=True, key= lambda x: (-x[0],x[-1]))
lst = RangeList[0]
oldstart = lst[0]
oldend = lst[-1]
compressed = {oldstart: lst}
for lst in RangeList[1:]:
    start = lst[0]
    end = lst[-1]
    if (start not in compressed and oldend < end):
        compressed[start] = lst
        oldstart, oldend = start, end

print compressed.values()

答案 3 :(得分:0)

这使用setissubset,但首先按大小对列表进行排序,filter函数以相反的顺序再次遍历输入,尝试优化搜索。这可以改善O()顺序。

ranges = [[0,1,2,3,4], [1,2], [0,1], [2,3,4], [3,4,5], [3,4,5,6], [4,5], [6,7], [5,6]]

sets = [set(a) for a in ranges]
sets.sort(key=len)
reverse_sets = sets[:]
reverse_sets.reverse()

def f(x): 
    for y in reverse_sets:
        if x == y:
            continue
        if x.issubset(y):
            return False
    return True

print 'Result=', [list(a) for a in filter(f, sets)]

结果:

Result= [[6, 7], [3, 4, 5, 6], [0, 1, 2, 3, 4]]