Question

我想在字符串列表中避免冗余到所有可能的组合（例如1122与我的上下文中的2211相同，因此只有一个或另一个应该在结果中列表）。

我还想在组合过程中应用过滤器。例如，我不希望在包含3的结果中包含任何字符串。

我该如何处理这个逻辑？

此代码正在进行组合

>>> keywords = [''.join(i) for i in itertools.product(['11', '22', '33'], repeat = 2)]
>>> keywords
['1111', '1122', '1133', '2211', '2222', '2233', '3311', '3322', '3333']

Answer 1

根据您的实际数据，可能是一种更有效的方法，但下面的算法可行。我们通过简单的比较消除了重复。我把'3'的检查放到了一个函数中。这比在列表理解中直接执行要慢一些，但它使代码更加通用。

import itertools

bases = ['11', '22', '33', '44']

def is_ok(u, v):
    ''' Test if a u,v pair is permitted '''
    return not ('3' in u or '3' in v)

keywords = [u+v for u, v in itertools.product(bases, repeat = 2) if u <= v and is_ok(u, v)]

<强>输出

['1111', '1122', '1144', '2222', '2244', '4444']
print(keywords)

Answer 2

通过过滤itertools.combinations_with_replacement：

可以实现同样的效果

<强>代码

import itertools as it


bases = ["11", "22", "33", "44"]

[x+y for x,y in it.combinations_with_replacement(bases, 2) if "3" not in x+y]
# ['1111', '1122', '1144', '2222', '2244', '4444']

此版本更通用，不依赖于比较数字字符串。

<强>详情

从the docs我们可以理解为什么会这样：

combinations_with_replacement()的代码也可以表示为product()的子序列，过滤了元素不按排序顺序排列的条目（根据它们在输入池中的位置）

def combinations_with_replacement(iterable, r):
    pool = tuple(iterable)
    n = len(pool)
    for indices in product(range(n), repeat=r):
        if sorted(indices) == list(indices):
            yield tuple(pool[i] for i in indices)

以这种方式，每个项目与唯一索引相关联。当比较两个项目的索引时，仅使用排序的组合来产生项目。其余的指数已经被丢弃了。

(0, 0)
(0, 1)
(0, 2)
(0, 3)
(1, 0)                                         # discarded
(1, 1)
...

有关此工具与itertools.product之间相似性的详细信息，请参阅the docs。

Answer 3

这应该做你想要的：

import itertools

def main():

    newkeywords = []
    keywords = ["".join(i) for i in itertools.product(["11", "22", "33"], repeat = 2)]
    for item in keywords:
        newitem = "".join(sorted(item))
        if "3" not in newitem and newitem not in newkeywords:
            newkeywords.append(newitem)
    print(newkeywords)

main()

结果：

['1111', '1122', '2222']

它创建一个新列表，如果该排序项目（使1122与2211相同）存在或者数字＆＃34; 3＆＃34;存在，请勿将其添加到新列表中。

如何避免冗余并将过滤器应用于字符串组合

3 个答案: