计算相邻数据的特定长度组合的有效方法?

时间:2019-07-06 23:18:59

标签: python-3.x itertools combinatorics

我有一个元素列表,我想确定所有可以排列的组合-保留它们的顺序-到达'n'个组

因此,例如,如果我有一个A,B,C,D,E的有序列表,并且只想要2个组,则这四个解决方案应该是;

ABCD, E
ABC, DE
AB, CDE
A, BCDE

现在,在另一个StackOverflow post的帮助下,我想出了一个可行的蛮力解决方案,该解决方案可以计算所有可能分组的所有可能组合,从中我可以简单地提取出满足目标分组数的情况

对于合理数量的元素,这很好,但是当我扩展元素的数量时,组合的数量会非常迅速地增加,我想知道是否可能存在一种巧妙的方法来将计算出的解限制为仅那些符合我的目标分组编号的人?

到目前为止的代码如下;

import itertools
import string
import collections

def generate_combination(source, comb):
    res = []
    for x, action in zip(source,comb + (0,)):
        res.append(x)
        if action == 0:
            yield "".join(res)
            res = []

#Create a list of first 20 letters of the alphabet
seq = list(string.ascii_uppercase[0:20])
seq

[“ A”,“ B”,“ C”,“ D”,“ E”,“ F”,“ G”,“ H”,“ I”,“ J”,“ K”,“ L” ','M','N','O','P','Q','R','S','T']

#Generate all possible combinations
combinations = [list(generate_combination(seq,c)) for c in itertools.product((0,1), repeat=len(seq)-1)]
len(combinations)

524288

#Create a list that counts the number of groups in each solution, 
#and counter to allow easy query
group_counts = [len(i) for i in combinations]
count_dic = collections.Counter(group_counts)
count_dic[1], count_dic[2], count_dic[3], count_dic[4], count_dic[5], count_dic[6]

(1,19,171,969,3876,11628)

如您所见,虽然计算了超过一百万种组合,但如果我只想要长度= 5的组合,

有什么建议吗?

1 个答案:

答案 0 :(得分:1)

seq划分为5个部分等效于range(1, len(seq))中要剪切seq的4个位置的选择。 因此,您可以使用itertools.combinations(range(1, len(seq)), 4)seq的所有分区生成为5部分:

import itertools as IT
import string

def partition_into_n(iterable, n, chain=IT.chain, map=map):
    """
    Return a generator of all partitions of iterable into n parts.
    Based on http://code.activestate.com/recipes/576795/ (Raymond Hettinger)
    which generates all partitions.
    """
    s = iterable if hasattr(iterable, '__getitem__') else tuple(iterable)
    size = len(s)
    first, middle, last = [0], range(1, size), [size]
    getitem = s.__getitem__
    return (map(getitem, map(slice, chain(first, div), chain(div, last)))
            for div in IT.combinations(middle, n-1))

seq = list(string.ascii_uppercase[0:20])
ngroups = 5
for partition in partition_into_n(seq, ngroups):
    print(' '.join([''.join(grp) for grp in partition]))

print(len(list(partition_into_n(seq, ngroups))))

收益

A B C D EFGHIJKLMNOPQRST
A B C DE FGHIJKLMNOPQRST
A B C DEF GHIJKLMNOPQRST
A B C DEFG HIJKLMNOPQRST
...
ABCDEFGHIJKLMNO P Q RS T
ABCDEFGHIJKLMNO P QR S T
ABCDEFGHIJKLMNO PQ R S T
ABCDEFGHIJKLMNOP Q R S T
3876