Question

说我有一个列表L.如何在K组的所有分区上获得迭代器？

实施例：L = [2,3,5,7,11,13]，K = 3

3组的所有可能分区列表：

[ [ 2 ], [ 3, 5], [ 7,11,13] ]
[ [ 2,3,5 ], [ 7, 11], [ 13] ]
[ [ 3, 11 ], [ 5, 7], [ 2, 13] ]
[ [ 3 ], [ 11 ], [ 5, 7, 2, 13] ]
etc...

===更新===

我正在开发一个似乎有效的解决方案，所以我只是复制粘贴它

# -*- coding: utf-8 -*-

import itertools 

# return ( list1 - list0 )
def l1_sub_l0( l1, l0 ) :
    """Substract two lists"""
    #
    copy_l1 = list( l1 )
    copy_l0 = list( l0 )

    #
    for xx in l0 :
        #
        if copy_l1.count( xx ) > 0 :
            #
            copy_l1.remove( xx )
            copy_l0.remove( xx )

    #
    return [ copy_l1, copy_l0 ]


#
def gen_group_len( n, k ) :
    """Generate all possible group sizes"""

    # avoid doubles
    stop_list = []
    #
    for t in itertools.combinations_with_replacement( xrange( 1, n - 1 ), k - 1 ) :
        #
        last_n = n - sum( t )

        # valid group size
        if last_n  >= 1 :
            res = tuple( sorted( t + ( last_n, ) ) )
            #
            if res not in stop_list :
                yield res
                stop_list.append( res )


# group_len = (1, 1, 3)

def gen( group_len, my_list ) :
    """Generate all possible partitions of all possible group sizes"""

    #
    if len( group_len ) == 1 :
        yield ( tuple( my_list ), )

    #
    else :

        # need for a stop list if 2 groups of same size
        stop_list = []

        #
        for t in itertools.combinations( my_list, group_len[ 0 ] ) :
            #
            reduced_list = l1_sub_l0( my_list, t )[ 0 ]

            #
            for t2 in gen( group_len[ 1: ], reduced_list ) :
                #
                tmp = set( ( t, t2[ 0 ] ) )
                #
                if tmp not in stop_list :
                    yield ( t, ) + t2
                    # avoid doing same thing twice
                    if group_len[ 1 ] == group_len[ 0 ] :
                        stop_list.append( tmp )


#
my_list = [ 3,5,7,11,13 ]
n = len( my_list )
k = 3

#
group_len_list = list( gen_group_len( n, k ) )
print "for %i elements, %i configurations of group sizes" % ( n, len(  group_len_list ) )
print group_len_list

#
for group_len in group_len_list :
    #
    print "group sizes", group_len
    #
    for x in gen( group_len, my_list ) :
        print x
    #
    print "==="

输出：

for 5 elements, 2 configurations of group sizes
[(1, 1, 3), (1, 2, 2)]
group sizes (1, 1, 3)
((3,), (5,), (7, 11, 13))
((3,), (7,), (5, 11, 13))
((3,), (11,), (5, 7, 13))
((3,), (13,), (5, 7, 11))
((5,), (7,), (3, 11, 13))
((5,), (11,), (3, 7, 13))
((5,), (13,), (3, 7, 11))
((7,), (11,), (3, 5, 13))
((7,), (13,), (3, 5, 11))
((11,), (13,), (3, 5, 7))
===
group sizes (1, 2, 2)
((3,), (5, 7), (11, 13))
((3,), (5, 11), (7, 13))
((3,), (5, 13), (7, 11))
((5,), (3, 7), (11, 13))
((5,), (3, 11), (7, 13))
((5,), (3, 13), (7, 11))
((7,), (3, 5), (11, 13))
((7,), (3, 11), (5, 13))
((7,), (3, 13), (5, 11))
((11,), (3, 5), (7, 13))
((11,), (3, 7), (5, 13))
((11,), (3, 13), (5, 7))
((13,), (3, 5), (7, 11))
((13,), (3, 7), (5, 11))
((13,), (3, 11), (5, 7))
===

Answer 1

这有效，虽然它可能超级有效（我将它们排序以避免重复计算）：

def clusters(l, K):
    if l:
        prev = None
        for t in clusters(l[1:], K):
            tup = sorted(t)
            if tup != prev:
                prev = tup
                for i in xrange(K):
                    yield tup[:i] + [[l[0]] + tup[i],] + tup[i+1:]
    else:
        yield [[] for _ in xrange(K)]

它也返回空簇，所以你可能想要包装它以便只得到非空簇：

def neclusters(l, K):
    for c in clusters(l, K):
        if all(x for x in c): yield c

只计算检查：

def kamongn(n, k):
    res = 1
    for x in xrange(n-k, n):
        res *= x + 1
    for x in xrange(k):
        res /= x + 1
    return res

def Stirling(n, k):
    res = 0
    for j in xrange(k + 1):
        res += (-1)**(k-j) * kamongn(k, j) * j ** n
    for x in xrange(k):
        res /= x + 1
    return res

>>> sum(1 for _ in neclusters([2,3,5,7,11,13], K=3)) == Stirling(len([2,3,5,7,11,13]), k=3)
True

有效！

输出：

>>> clust = neclusters([2,3,5,7,11,13], K=3)
>>> [clust.next() for _ in xrange(5)]
[[[2, 3, 5, 7], [11], [13]], [[3, 5, 7], [2, 11], [13]], [[3, 5, 7], [11], [2, 13]], [[2, 3, 11], [5, 7], [13]], [[3, 11], [2, 5, 7], [13]]]

Answer 2

已编辑：如@moose所述，以下内容仅确定连续索引位于同一群集中的分区。对所有排列执行此分区将给出所寻求的答案。

Itertools对于这种组合列表非常有用。首先，我们将您的任务视为选择数组中所有k-1个不同分裂点集的等效问题。这由itertools.combinations解决，它返回组合而不替换给定的可迭代的特定大小，并且它返回的值按照它们在原始可迭代中找到的顺序。

您的问题因此解决了以下问题：

import itertools
def neclusters(l, K):
    for splits in itertools.combinations(range(len(l) - 1), K - 1):
        # splits need to be offset by 1, and padded
        splits = [0] + [s + 1 for s in splits] + [None]
        yield [l[s:e] for s, e in zip(splits, splits[1:])]

numpy的split函数旨在使这些分区给出拆分偏移量，所以这里有一个生成numpy数组列表的替代方法：

import itertools
def neclusters(l, K):
    for splits in itertools.combinations(range(len(l) - 1), K - 1):
        yield np.split(l, 1 + np.array(splits))

Answer 3

此问题的一个简单替代视图是将三个群集标签之一分配给每个元素。

import itertools
def neclusters(l, k):
    for labels in itertools.product(range(k), repeat=len(l)):
        partition = [[] for i in range(k)]
        for i, label in enumerate(labels):
            partition[label].append(l[i])
        yield partition

与@ val的回答一样，这可以被包装以删除具有空簇的分区。

Answer 4

使用more_itertools.partitions过滤大小为conda install pip的分区（请注意结尾的“ s”）：

给出

演示

import more_itertools as mit


iterable = [2, 3, 5, 7, 11, 13]
k = 3

注意：more_itertools是第三方软件包。通过list(filter(lambda x: len(x) == k, mit.partitions(iterable))) # [[[2], [3], [5, 7, 11, 13]], # [[2], [3, 5], [7, 11, 13]], # [[2], [3, 5, 7], [11, 13]], # [[2], [3, 5, 7, 11], [13]], # [[2, 3], [5], [7, 11, 13]], # [[2, 3], [5, 7], [11, 13]], # [[2, 3], [5, 7, 11], [13]], # [[2, 3, 5], [7], [11, 13]], # [[2, 3, 5], [7, 11], [13]], # [[2, 3, 5, 7], [11], [13]]]安装。

将所有分区的迭代器转换为k组？

4 个答案: