Question

我正在尝试创建一个生成器函数：

def combinations(iterable, r, maxGapSize):
    maxGapSizePlusOne = maxGapSize+1

    pool = tuple(iterable)
    n = len(pool)
    if r > n:
        return
    indices = list(range(r))

    while True:
        for i in reversed(range(r)):        
            if indices[i] != i + n - r:     
                break
        else:
            return

        indices[i] += 1
        for j in range(i+1, r):
            indices[j] = indices[j-1] + 1

        previous = indices[0]
        for k in indices[1:]:
            if k-previous>maxGapSizePlusOne:
                isGapTooBig = True
                break
            previous = k
        else:
            isGapTooBig = False

        if not isGapTooBig:
            print(indices)


combinations(("Aa","Bbb","Ccccc","Dd","E","Ffff",),2,1)

为了进行调试，我打印出了我希望用来从名为'iterable'的参数中选择元素的索引。这给了我：

[0, 2]
[1, 2]
[1, 3]
[2, 3]
[2, 4]
[3, 4]
[3, 5]
[4, 5]

忽略[0,1]，因为这是在其他地方产生的......

这正是我想要的，但我猜测我的代码过于复杂和低效。 iterable的大小可能是数千，而且可能是maxGapSize < 5。

有哪些提示可以帮助我做得更好？

Answer 1

您的大部分代码与Python code for itertools.combination完全相同。 itertools.combination的CPython实现是用C语言编写的。上面链接的文档显示了Python等效代码。

只需使用itertools.combination而不是使用Python等效代码，即可加快功能：

import itertools as it
def mycombinations(iterable, r, maxGapSize):
    maxGapSizePlusOne = maxGapSize+1    
    for indices in it.combinations(range(len(iterable)),r):
        previous = indices[0]
        for k in indices[1:]:
            if k-previous>maxGapSizePlusOne:                    
                break
            previous = k
        else:
            yield indices   
            # print(indices)

您可以使用timeit以这种方式比较替代实施的相对速度：

原始版本：

% python -mtimeit -s'import test' 'list(test.combinations(("Aa","Bbb","Ccccc","Dd","E","Ffff",),2,1))'
10000 loops, best of 3: 63.9 usec per loop

与

使用itertools.combination:

% python -mtimeit -s'import test' 'list(test.mycombinations(("Aa","Bbb","Ccccc","Dd","E","Ffff",),2,1))'
100000 loops, best of 3: 17.2 usec per loop

上面的代码会生成所有组合，包括初始组合range(len(iterable))。我觉得离开它会更漂亮。但如果你真的想要删除第一个组合，你可以使用

def mycombinations(iterable, r, maxGapSize):
    ...
    comb=it.combinations(range(len(iterable)),r)
    next(comb)
    for indices in comb:

顺便说一句，函数combinations并不真正依赖于iterable。它只取决于iterable的长度。因此，最好进行呼叫签名

def combinations(length, r, maxGapSize):

帮助生成器功能中的逻辑

1 个答案: