Python - 构建子列表,满足来自大量组合的特定条件

时间:2012-12-11 18:53:32

标签: python for-loop itertools

读了很久,我第一次找不到我正在做的事情的答案。

我有一个包含93个字符串的列表,每个字符串长度为6个字符。从这93个字符串中,我想要识别一组20个,它们都满足相对于集合中其他条件的特定标准。虽然itertools.combinations将为我提供所有可能的组合,但并非所有条件都值得检查。

例如,如果[list [0],list [1]等]失败,因为list [0]和list [1]不能在一起,那么其他18个字符串是什么并不重要,该集合将失败每一次,这都是浪费大量的检查。

目前我已经使用了20个嵌套for循环,但似乎必须有更好/更快的方法来实现它。:

for n1 in bclist:
    building = [n1]
    n2bclist = [bc for bc in bclist if bc not in building]
    for n2 in n2bclist:              #this is the start of what gets repeated 19 times
        building.append(n2)
        if test_function(building): #does set fail? (counter intuitive, True when fail, False when pass)
            building.remove(n2)
            continue
        n3bclist = [bc for bc in bclist if bc not in building]
        #insert the additional 19 for loops, with n3 in n3, n4 in n4, etc
        building.remove(n2)

在20日的循环中有打印语句,如果一组20甚至存在,则提醒我。 for语句至少允许我在单个添加失败时提前跳过集合,但是当没有更大的组合失败时没有记忆:

例如[list[0], list[1]]失败,所以跳到[list[0], [list[2]]通过。接下来是[list[0], list[2], list[1]],它将失败,因为0和1再次在一起,所以它将移动到[list[0], list[2], list[3]],可能会或不会通过。我担心的是最终还会测试:

  • [list[0], list[3], list[2]]
  • [list[2], list[0], list[3]]
  • [list[2], list[3], list[0]]
  • [list[3], list[0], list[2]]
  • [list[3], list[2], list[0]]

所有这些组合将与之前的组合具有相同的结果。基本上我交易了itertools.combinations的恶魔,测试我知道失败的所有集合的组合,因为早期的值对于for循环的恶魔而言失败,当我不关心它们的顺序时,它将值的顺序视为一个因素。这两种方法都会显着增加我的代码完成所需的时间。

任何有关如何摆脱恶魔的想法都将不胜感激。

4 个答案:

答案 0 :(得分:1)

使用您当前的方法,但也要跟踪索引,以便在内部循环中可以跳过您已经检查过的元素:

bcenum = list(enumerate(bclist))
for i1, n1 in bcenum:
    building = [n1]
    for i2, n2 in bcenum[i1+1:]:              #this is the start of what gets repeated 19 times
        building.append(n2)
        if test_function(building): #does set fail? (counter intuitive, True when fail, False when pass)
            building.remove(n2)
            continue
        for i3, n3 in bcenum[i2+1:]:
            # more nested loops
        building.remove(n2)

答案 1 :(得分:1)

def gen(l, n, test, prefix=()):
  if n == 0:
    yield prefix
  else:
    for i, el in enumerate(l):
      if not test(prefix + (el,)):
        for sub in gen(l[i+1:], n - 1, test, prefix + (el,)):
          yield sub

def test(l):
  return sum(l) % 3 == 0 # just a random example for testing

print list(gen(range(5), 3, test))

这将从n中选择基数l的子集,以便test(subset) == False

它试图避免不必要的工作。但是,鉴于有93种方法可以选择20种元素,您可能需要重新考虑整体方法。

答案 2 :(得分:0)

您可以利用问题的两个方面:

  1. 顺序无所谓
  2. 如果test_function(L)True,那么test_function的任何子列表中的L也将为True
  3. 您还可以通过处理索引0-92而不是list[0] - list[92]来简化一些事情 - 它只在test_function内,我们可能会关心列表的内容是

    以下代码通过首先找到可行对,然后是四组,八组和十六组来完成。最后,它找到了16和4的所有可行组合,以获得20个列表。然而,有超过100,000套8个,所以它仍然太慢,我放弃了。可能你可以沿着同一条线做某些事情但是用itertools来加快速度,但可能还不够。

    target = range(5, 25)
    def test_function(L):
        for i in L:
            if not i in target:
                return True
    def possible_combos(A, B):
        """
        Find all possible pairings of a list within A and a list within B
        """
        R = []
        for i in A:
            for j in B:
                if i[-1] < j[0] and not test_function(i + j):
                    R.append(i + j)
        return R
    def possible_doubles(A):
        """
        Find all possible pairings of two lists within A
        """
        R = []
        for n, i in enumerate(A):
            for j in A[n + 1:]:
                if i[-1] < j[0] and not test_function(i + j):
                    R.append(i + j)
        return R
    # First, find all pairs that are okay
    L = range(92) 
    pairs = []
    for i in L:
        for j in L[i + 1:]:
            if not test_function([i, j]):
                pairs.append([i, j])
    
    # Then, all pairs of pairs
    quads = possible_doubles(pairs)
    print "fours", len(quads), quads[0]
    # Then all sets of eight, and sixteen
    eights = possible_doubles(quads)
    print "eights", len(eights), eights[0]
    sixteens = possible_doubles(eights)
    print "sixteens", len(sixteens), sixteens[0]
    
    # Finally check all possible combinations of a sixteen plus a four
    possible_solutions = possible_combos(sixteens, fours)
    print len(possible_solutions), possible_solutions[0]
    
    编辑:我找到了一个更好的解决方案。首先,确定符合test_function的范围(0-92)内的所有值对,保持对顺序。据推测,第一对的第一个值必须是解的第一个值,最后一对的第二个值必须是解的最后一个值(但是检查... test_function的假设是正确的吗?如果这不是一个安全的假设,那么您需要对开始和结束的所有可能值重复find_paths。然后找到从第1个到最后一个值的路径,该路径长度为20个值,并且符合test_function

    def test_function(S):
        for i in S:
            if not i in target:
                return True
        return False
    
    def find_paths(p, f):
        """ Find paths from end of p to f, check they are the right length,
            and check they conform to test_function
        """
        successful = []
        if p[-1] in pairs_dict:
            for n in pairs_dict[p[-1]]:
                p2 = p + [n]
                if (n == f and len(p2) == target_length and
                    not test_function(p2)):
                    successful.append(p2)
                else:
                    successful += find_paths(p2, f)
        return successful
    
    list_length = 93              # this is the number of possible elements
    target = [i * 2 for i in range(5, 25)] 
        # ^ this is the unknown target list we're aiming for...
    target_length = len(target)   # ... we only know its length
    L = range(list_length - 1)
    pairs = []
    for i in L:
        for j in L[i + 1:]:
            if not test_function([i, j]):
                pairs.append([i, j])
    firsts = [a for a, b in pairs]
    nexts = [[b for a, b in pairs if a == f] for f in firsts]
    pairs_dict = dict(zip(firsts, nexts))
    print "Found solution(s):", find_paths([pairs[0][0]], pairs[-1][1])
    

答案 3 :(得分:0)

您应该将解决方案基于itertools.combinations,因为这将解决订购问题;短路滤波相对容易解决。

递归解决方案

让我们快速回顾一下如何实施combinations的工作;最简单的方法是采用嵌套循环方法并将其转换为递归样式:

def combinations(iterable, r):
    pool = tuple(iterable)
    for i in range(0, len(pool)):
        for j in range(i + 1, len(pool)):
            ...
                yield (i, j, ...)

转换为递归形式:

def combinations(iterable, r):
    pool = tuple(iterable)
    def inner(start, k, acc):
        if k == r:
            yield acc
        else:
            for i in range(start, len(pool)):
                for t in inner(i + 1, k + 1, acc + (pool[i], )):
                    yield t
    return inner(0, 0, ())

现在应用过滤器很简单:

def combinations_filterfalse(predicate, iterable, r):
    pool = tuple(iterable)
    def inner(start, k, acc):
        if predicate(acc):
            return
        elif k == r:
            yield acc
        else:
            for i in range(start, len(pool)):
                for t in inner(i + 1, k + 1, acc + (pool[i], )):
                    yield t
    return inner(0, 0, ())

我们来看看:

>>> list(combinations_filterfalse(lambda t: sum(t) % 2 == 1, range(5), 2))
[(0, 2), (0, 4), (2, 4)]

迭代解决方案

the documentation中列出的itertools.combinations的实际实施使用了迭代循环:

def combinations(iterable, r):
    pool = tuple(iterable)
    n = len(pool)
    if r > n:
        return
    indices = range(r)
    yield tuple(pool[i] for i in indices)
    while True:
        for i in reversed(range(r)):
            if indices[i] != i + n - r:
                break
        else:
            return
        indices[i] += 1
        for j in range(i+1, r):
            indices[j] = indices[j-1] + 1
        yield tuple(pool[i] for i in indices)

为了优雅地适应谓词,有必要稍微重新排序循环:

def combinations_filterfalse(predicate, iterable, r):
    pool = tuple(iterable)
    n = len(pool)
    if r > n or predicate(()):
        return
    elif r == 0:
        yield ()
        return
    indices, i = range(r), 0
    while True:
        while indices[i] + r <= i + n:
            t = tuple(pool[k] for k in indices[:i+1])
            if predicate(t):
                indices[i] += 1
            elif len(t) == r:
                yield t
                indices[i] += 1
            else:
                indices[i+1] = indices[i] + 1
                i += 1
        if i == 0:
            return
        i -= 1
        indices[i] += 1

再次检查:

>>> list(combinations_filterfalse(lambda t: sum(t) % 2 == 1, range(5), 2))
[(0, 2), (0, 4), (2, 4)]
>>> list(combinations_filterfalse(lambda t: t == (1, 4), range(5), 2))
[(0, 1), (0, 2), (0, 3), (0, 4), (1, 2), (1, 3), (2, 3), (2, 4), (3, 4)]
>>> list(combinations_filterfalse(lambda t: t[-1] == 3, range(5), 2))
[(0, 1), (0, 2), (0, 4), (1, 2), (1, 4), (2, 4)]
>>> list(combinations_filterfalse(lambda t: False, range(5), 2))
[(0, 1), (0, 2), (0, 3), (0, 4), (1, 2), (1, 3), (1, 4), (2, 3), (2, 4), (3, 4)]
>>> list(combinations_filterfalse(lambda t: False, range(5), 0))
[()]

比较

事实证明,递归解决方案不仅更简单,而且速度更快:

In [33]: timeit list(combinations_filterfalse_rec(lambda t: False, range(20), 5))
10 loops, best of 3: 24.6 ms per loop

In [34]: timeit list(combinations_filterfalse_it(lambda t: False, range(20), 5))
10 loops, best of 3: 76.6 ms per loop