Question

我试图在Python中找到一个高效的解决方案：

>>> func([1,2,3], [1,2])
[(1,1), (1,2), (1,3), (2,2), (2,3)]

这与itertools.combinations_with_replacement类似，不同之处在于它可能需要多次迭代。它也与itertools.product类似，只是它省略了与顺序无关的重复结果。

所有输入都是同一系列的前缀（即它们都以相同的元素开头并遵循相同的模式，但可能有不同的长度）。

该函数必须能够将任意数量的iterables作为输入。

给定一组列表 A ， B ， C ，...，这里是生成这些结果的算法草图

assert len(A) <= len(B) <= len(C) <= ...
for i in 0..len(A)
  for j in i..len(B)
    for k in j..len(C)
        .
        .
        .
      yield A[i], B[j], C[k], ...

我无法做的事

使用itertools.product并过滤结果。这必须具有高效性。
使用递归。函数开销会比使用itertools.product并过滤合理数量的iterables慢。

我怀疑使用itertools可以做到这一点，但我不知道它是什么。

编辑：我正在寻找花费最少时间的解决方案。

编辑2：对于我尝试优化的内容似乎存在一些疑惑。我将以一个例子来说明。

>>> len(list(itertools.product( *[range(8)] * 5 )))
32768
>>> len(list(itertools.combinations_with_replacement(range(8), 5)))
792

第一行给出了滚动5个8面骰子的顺序依赖的可能性。第二个给出了与顺序无关的可能性的数量。无论性能itertools.product如何，它都需要 2个数量级更多次迭代才能得到比itertools.combinations_with_replacement更好的结果。我试图找到一种类似于itertools.combinations_with_replacement的方法，但是使用多个迭代可以最大限度地减少迭代次数或时间性能。（product在 $O(M^N)$ 中运行，而combinations_with_replacement在 $O\binom{M+N+1}{N}$ 中运行，其中M是骰子上的边数，N是骰子的数量）

Answer 1

此解决方案没有递归或过滤。它试图只生成索引的升序序列，因此它只能用于相同集合的前缀。此外，它仅使用索引进行元素识别，因此它不会强制系列元素具有可比性，甚至可以清除。

def prefixCombinations(coll,prefixes):
    "produces combinations of elements of the same collection prefixes"
    prefixes = sorted(prefixes) # does not impact result through it's unordered combinations
    n = len(prefixes)
    indices = [0]*n
    while True:            
        yield tuple(coll[indices[i]] for i in range(n))
        #searching backwards for non-maximum index
        for i in range(n-1,-1,-1):
            if indices[i] < prefixes[i] - 1 : break
        # if all indices hits maximum - leave
        else: break
        level = indices[i] + 1
        for i in range(i,n): indices[i] = level

示例

>>> list(prefixCombinations([1,2,3,4,5], (3,2)))
[[1, 1], [1, 2], [1, 3], [2, 2], [2, 3]]

>>> list(prefixCombinations([1,2,3,4,5], (3,2,5)))
[[1, 1, 1], [1, 1, 2], [1, 1, 3], [1, 1, 4], [1, 1, 5], [1, 2, 2], [1, 2, 3], [1, 2, 4], [1, 2, 5], [1, 3, 3], [1, 3, 4], [1, 3, 5], [2, 2, 2], [2, 2, 3], [2, 2, 4], [2, 2, 5], [2, 3, 3], [2, 3, 4], [2, 3, 5]]

>>> from itertools import combinations_with_replacement
>>> tuple(prefixCombinations(range(10),[10]*4)) == tuple(combinations_with_replacement(range(10),4))
True

Answer 2

由于这是一个生成器，因此无法有效地改变性能（仅在O(n)周围包裹itertools.product）：

import itertools

def product(*args):
    for a, b in itertools.product(*args):
        if a >= b:
            yield b, a

print list(product([1,2,3], [1,2]))

输出：

[(1, 1), (1, 2), (2, 2), (1, 3), (2, 3)]

甚至：

product = lambda a, b: ((y, x) for x in a for y in b if x >= y)

Answer 3

这是一个实现。我们的想法是使用已排序的容器强加规范顺序，并以这种方式避免重复。因此，我不是一步生成重复项，而是避免以后需要过滤。

它依赖于“sortedcontainers”库，它提供快速（与C实现一样快）的已排序容器。 [我不以任何方式加入这个图书馆]

from sortedcontainers import SortedList as SList
#see at http://www.grantjenks.com/docs/sortedcontainers/

def order_independant_combination(*args):
    filtered = 0

    previous= set()
    current = set()

    for iterable in args:
        if not previous:
            for elem in iterable:
                current.add(tuple([elem]))

        else:
            for elem in iterable:
                for combination in previous:
                    newCombination = SList(combination)
                    newCombination.add(elem)
                    newCombination = tuple(newCombination)
                    if not newCombination in current:
                        current.add(newCombination)
                    else:
                        filtered += 1

        previous = current
        current = set()

    if filtered != 0:
        print("{0} duplicates have been filtered during geneeration process".format(filtered))
    return list(SList(previous))

if __name__ == "__main__":
    result = order_independant_combination(*[range(8)] * 5)
    print("Generated a result of length {0} that is {1}".format(len(result), result))

执行给：

[(1, 1), (1, 2), (1, 3), (2, 2), (2, 3)]

你可以测试添加更多的迭代作为参数，它可以工作。

希望如果不解决你的问题，它至少可以帮助你。

Vaisse Arthur。

编辑：回答评论。这不是一个好的分析。在生成期间过滤重复项比使用itertools.product最有效，然后过滤重复项结果。事实上，消除重复会导致一步避免在以下所有步骤中生成重复解决方案。

执行此操作：

if __name__ == "__main__":
    result = order_independant_combination([1,2,3],[1,2],[1,2],[1,2])
    print("Generated a result of length {0} that is {1}".format(len(result), result))

我得到了以下结果：

9 duplicates have been filtered during geneeration process
Generated a result of length 9 that is [(1, 1, 1, 1), (1, 1, 1, 2), (1, 1, 1, 3), (1, 1, 2, 2), (1, 1, 2, 3), (1, 2, 2, 2), (1, 2, 2, 3), (2, 2, 2, 2), (2, 2, 2, 3)]

使用itertools时我得到了这个：

>>> import itertools
>>> c =  list(itertools.product([1,2,3],[1,2],[1,2],[1,2]))
>>> c
[(1, 1, 1, 1), (1, 1, 1, 2), (1, 1, 2, 1), (1, 1, 2, 2), (1, 2, 1, 1), (1, 2, 1, 2), (1, 2, 2, 1), (1, 2, 2, 2), (2, 1, 1, 1), (2, 1, 1, 2), (2, 1, 2, 1), (2, 1, 2, 2), (2, 2, 1, 1), (2, 2, 1, 2), (2, 2, 2, 1), (2, 2, 2, 2), (3, 1, 1, 1), (3, 1, 1, 2), (3, 1, 2, 1), (3, 1, 2, 2), (3, 2, 1, 1), (3, 2, 1, 2), (3, 2, 2, 1), (3, 2, 2, 2)]
>>> len(c)
24

简单的计算给出了这个：

pruned generation : 9 result + 9 element filtered -> 18 element generated.
itertools : 24 element generated.

你给它的元素越多，它们越长，差异就越重要。

示例：

result = order_independant_combination([1,2,3,4,5],[1,2,3,4,5],[1,2,3,4,5],[1,2,3,4,5])
print("Generated a result of length {0} that is {1}".format(len(result), result))

结果：

155 duplicates have been filtered during geneeration process
Generated a result of length 70 ...

Itertools：

>>> len(list(itertools.product([1,2,3,4,5],[1,2,3,4,5],[1,2,3,4,5],[1,2,3,4,5])))
625

400个元素的差异。

编辑2：使用*range(8) * 5，它会2674 duplicates have been filtered during geneeration process. Generated a result of length 792...

高效组合，可替代多个迭代，或与订单无关的产品

我无法做的事

3 个答案: