N选择列表的N / 2个子列表

时间:2016-03-16 02:07:20

标签: python list permutation combinatorics

在Python中是否有一种有效的方法可以将大小为n的列表的所有分区分成两个大小为n/2的子集?我想得到一些迭代构造,使得每次迭代提供原始列表的两个非重叠子集,每个子​​集的大小为n/2

例如:

A = [1,2,3,4,5,6]    # here n = 6
# some iterative construct
    # in each iteration, a pair of subsets of size n/2
    # subsets = [[1,3,4], [2,5,6]] for example for one of the iterations
    # subsets = [[1,2,5],[3,4,6]] a different iteration example

子集应该是非重叠的,例如[[1,2,3], [4,5,6]]有效但[[1,2,3], [3,4,5]]不有效。两个子集的顺序无关紧要,例如, [[1,2,3], [4,5,6]]不算与[[4,5,6], [1,2,3]]不同,因此这两个中只有一个应该出现在迭代中。每个子集中的顺序也无关紧要,因此[[1,2,3], [4,5,6]][[1,3,2], [4,5,6]][[3,2,1], [6,5,4]]等都算相同,因此只有其中一个应该在整个迭代中出现。< / p>

5 个答案:

答案 0 :(得分:5)

您需要使用itertools.combinations来执行此操作。输入是您要选择项目的列表,第二个是要选择的项目数。

result = [list(item) for item in itertools.combinations(input, len(input) // 2)]

对于[1,2,3,4]的输入,这会产生

[[1, 2], [1, 3], [1, 4], [2, 3], [2, 4], [3, 4]]

作为@ShadowRanger pointed out,如果订单在您的列表中很重要并且您想要所有排列,那么您需要将itertools.permutations替换为解决方案。

result = [list(item) for item in itertools.permutations(input, len(input) // 2)]
# [[1, 2], [1, 3], [1, 4], [2, 1], [2, 3], [2, 4], [3, 1], [3, 2], [3, 4], [4, 1], [4, 2], [4, 3]]

修改

在仔细阅读你的问题后,目前还不清楚你是否想要我所展示的所有n/2排列,或者你想要一个lits列表,其中每个元素还是另外两个列表中的两个“排列的一半。

要完成此操作,您可以执行以下操作(包含一些索引帮助from @Blckknght

result = [[list(item[::2]), list(item[1::2])] for item in itertools.permutations(input)]

在这种情况下,[1,2,3,4]的输出将是

[[[1, 3], [2, 4]], [[1, 4], [2, 3]], [[1, 2], [3, 4]], [[1, 4], [3, 2]], [[1, 2], [4, 3]], [[1, 3], [4, 2]], [[2, 3], [1, 4]], [[2, 4], [1, 3]], [[2, 1], [3, 4]], [[2, 4], [3, 1]], [[2, 1], [4, 3]], [[2, 3], [4, 1]], [[3, 2], [1, 4]], [[3, 4], [1, 2]], [[3, 1], [2, 4]], [[3, 4], [2, 1]], [[3, 1], [4, 2]], [[3, 2], [4, 1]], [[4, 2], [1, 3]], [[4, 3], [1, 2]], [[4, 1], [2, 3]], [[4, 3], [2, 1]], [[4, 1], [3, 2]], [[4, 2], [3, 1]]]

<强> EDIT2

由于顺序并不重要,但你想要一种类似于最后一种方法的方法(列表列表),由于数组切片,这对于最后一种方法来说有点棘手。一种替代方法是使用setfrozenset来构造初始信息(而不是列表),因为在set中,在检查相等性时排序无关紧要。这将自动允许我们删除重复项。然后我们可以添加一个额外的步骤来转换回列表,如果这是你喜欢的。

from itertools import permutations
tmp = set([frozenset([frozenset(k[::2]),frozenset(k[1::2])]) for k in permutations(input)]) 
result = [[list(el) for el in item] for item in tmp];

这将产生

[[[1, 2], [3, 4]], [[2, 3], [1, 4]], [[1, 3], [2, 4]]]

答案 1 :(得分:4)

这是一个不使用itertools的解决方案。它使用一种名为Gosper's hack的技巧来生成位置换。有关其工作原理的说明,请参阅HAKMEM Item 175;维基百科文章Combinatorial number system中也提到了这种黑客行为。它在这个SO问题的接受答案中有所体现:Iterating over all subsets of a given size

parts函数是一个生成器,因此您可以在for循环中使用它,如我的测试所示。

工作原理。

要将长度 n 的列表分成长度为 n / 2 的子列表对,我们使用由 n /组成的二进制数bits 2 零位, n / 2 一位。给定位置的零位表示相应的列表元素进入左子列表,给定位置的一位表示相应的列表元素进入右子列表。

最初,bits设置为2 ** (n/2) - 1,因此,如果 n = 6,bits将以000111开头。

生成器使用Gosper的hack以数字顺序置换bits,当我们在最高位置获得一位时停止,因为那时我们开始获得子列表对的反转版本。

负责将bit中的模式转换为一对子列表的代码是:

    for i, u in enumerate(lst):
        ss[bits & (1<<i) == 0].append(u)

如果i中位位置bits为零,则ss[0]会从lst获取当前项,否则会附加到ss[1]

此代码在Python 2和Python 3上运行。

from __future__ import print_function

def parts(lst):
    ''' Generate all pairs of equal-sized partitions of a list of even length '''

    n = len(lst)
    if n % 2 != 0:
        raise ValueError('list length MUST be even')

    lim = 1 << (n - 1)
    bits = (1 << n // 2) - 1

    while bits < lim:
        #Use bits to partition lst
        ss = [[], []]
        for i, u in enumerate(lst):
            ss[bits & (1<<i) == 0].append(u)
        yield ss

        #Calculate next bits permutation via Gosper's hack (HAKMEM #175)
        u = bits & (-bits)
        v = bits + u
        bits = v | (((v ^ bits) // u) >> 2)

# Test
lst = list(range(1, 7))
for i, t in enumerate(parts(lst), 1):
    print('{0:2d}: {1}'.format(i, t))    

<强>输出

 1: [[1, 2, 3], [4, 5, 6]]
 2: [[1, 2, 4], [3, 5, 6]]
 3: [[1, 3, 4], [2, 5, 6]]
 4: [[2, 3, 4], [1, 5, 6]]
 5: [[1, 2, 5], [3, 4, 6]]
 6: [[1, 3, 5], [2, 4, 6]]
 7: [[2, 3, 5], [1, 4, 6]]
 8: [[1, 4, 5], [2, 3, 6]]
 9: [[2, 4, 5], [1, 3, 6]]
10: [[3, 4, 5], [1, 2, 6]]

我承认使用像Gosper的黑客那样难以理解的东西并不完全是Pythonic。 :)

以下是将parts的输出捕获到所有子列表的列表中的方法。它还说明parts可以处理字符串输入,但它会将输出生成为字符串列表。

seq = list(parts('abcd'))
print(seq)

<强>输出

[[['a', 'b'], ['c', 'd']], [['a', 'c'], ['b', 'd']], [['b', 'c'], ['a', 'd']]] 


这是另一种解决方案,使用itertools生成组合。它以与早期版本不同的顺序生成对。但是,它更短,更容易阅读。更重要的是,它的速度要快得多,我timeit测试的速度要快50到100%,具体取决于列表长度;对于较长的列表,差异似乎变小了。

def parts(lst):
    n = len(lst)
    if n % 2 != 0:
        raise ValueError('list length MUST be even')

    first = lst[0]
    for left in combinations(lst, n // 2):
        if left[0] != first:
            break
        right = [u for u in lst if u not in left]
        yield [list(left), right]

# Test
lst = list(range(1, 7))
for i, t in enumerate(parts(lst), 1):
    print('{0:2d}: {1}'.format(i, t))    

<强>输出

 1: [[1, 2, 3], [4, 5, 6]]
 2: [[1, 2, 4], [3, 5, 6]]
 3: [[1, 2, 5], [3, 4, 6]]
 4: [[1, 2, 6], [3, 4, 5]]
 5: [[1, 3, 4], [2, 5, 6]]
 6: [[1, 3, 5], [2, 4, 6]]
 7: [[1, 3, 6], [2, 4, 5]]
 8: [[1, 4, 5], [2, 3, 6]]
 9: [[1, 4, 6], [2, 3, 5]]

答案 2 :(得分:1)

这是一个基于itertools的生成器,我认为它可以生成您想要的值。

def sub_lists(sequence):
    all_but_first = set(sequence[1:])
    for item in itertools.combinations(sequence[1:], len(sequence)//2 - 1):
        yield [[sequence[0]] + list(item), list(all_but_first.difference(item))]

与Suever的答案中基于permutations的方法相比,我以两种方式避免了接近重复的输出。首先,我通过强制所有结果在第一个子列表中具有输入序列的第一个值来避免同时产生[["a", "b"], ["c", "d"]][["c", "d"], ["a", "b"]]。我通过使用set-subttraction构建第二个子列表来避免产生[["a", "b"], ["c", "d"]][["a", "b"], ["d", "c"]]

请注意,产生嵌套元组可能比嵌套列表更自然。要做到这一点,只需将最后一行更改为:

yield (sequence[0],) + item, tuple(all_but_first.difference(item))

答案 3 :(得分:1)

由于所有订单都不重要,但是我们正在列出列表列表(其中顺序本身很重要),我们可以假设一些不变量:在所有对中,第一对中的第一个元素是1,并且两者都是一对中的列表按排序顺序排列。

A = [1,2,3,4,5,6]

from itertools import combinations
first, rest = A[0], A[1:]
result = [
            [
                list((first,) + X), 
                [x for x in rest if x not in X]
            ] 
            for X in combinations(rest, len(A)/2 - 1)
         ]

答案 4 :(得分:0)

这里有一些代码可以针对此问题的各种解决方案执行timeit测试。

为了使比较公平,我在我的函数中评论了参数检查测试。

我还添加了一个功能parts_combo_set,它将我基于组合的答案的功能与Blckknght的功能相结合。它似乎是最快的,除了非常小的列表。

#!/usr/bin/env python

''' Generate all pairs of equal-sized partitions of a list of even length

    Testing the speed of the code at
    http://stackoverflow.com/q/36025609/4014959

    Written by PM 2Ring 2016.03.16
'''

from __future__ import print_function, division
from itertools import combinations
from timeit import Timer

def parts_combo(lst):
    n = len(lst)
    #if n % 2 != 0:
        #raise ValueError('list length MUST be even')

    first = lst[0]
    for left in combinations(lst, n // 2):
        if left[0] != first:
            break
        right = [u for u in lst if u not in left]
        yield [list(left), right]

def parts_combo_set(lst):
    n = len(lst)
    #if n % 2 != 0:
        #raise ValueError('list length MUST be even')

    first = lst[0]
    allset = set(lst)
    for left in combinations(lst, n // 2):
        if left[0] != first:
            break
        yield [list(left), list(allset.difference(left))]

def parts_gosper(lst):
    n = len(lst)
    #if n % 2 != 0:
        #raise ValueError('list length MUST be even')

    lim = 1 << (n - 1)
    bits = (1 << n // 2) - 1

    while bits < lim:
        #Use bits to partition lst
        ss = [[], []]
        for i, u in enumerate(lst):
            ss[bits & (1<<i) == 0].append(u)
        yield ss

        #Calculate next bits permutation via Gosper's hack (HAKMEM #175)
        u = bits & (-bits)
        v = bits + u
        bits = v | (((v ^ bits) // u) >> 2)

def sub_lists(sequence):
    all_but_first = set(sequence[1:])
    for item in combinations(sequence[1:], len(sequence)//2 - 1):
        yield [[sequence[0]] + list(item), list(all_but_first.difference(item))]

def amit(seq):
    first, rest = seq[0], seq[1:]
    return [
                [
                    list((first,) + t), 
                    [x for x in rest if x not in t]
                ]
                for t in combinations(rest, len(seq) // 2 - 1)
           ]

funcs = (
    parts_combo,
    parts_combo_set,
    parts_gosper,
    sub_lists,
    amit,
)

def rset(seq):
    fset = frozenset
    return fset([fset([fset(u),fset(v)]) for u,v in seq])

def validate():
    func = funcs[0]
    master = rset(func(lst))
    print('\nValidating against', func.func_name)
    for func in funcs[1:]:
        print(func.func_name, rset(func(lst)) == master)

def time_test(loops, reps):
    ''' Print timing stats for all the functions '''
    for func in funcs:
        fname = func.func_name
        print('\n%s' % fname)
        setup = 'from __main__ import lst,' + fname
        #cmd = 'list(%s(lst))' % fname
        cmd = 'for t in %s(lst):pass' % fname
        t = Timer(cmd, setup)
        r = t.repeat(reps, loops)
        r.sort()
        print(r)


num = 6
lst = list(range(1, num + 1))
print('num =', num)

#parts = funcs[0]
#for i, t in enumerate(parts(lst), 1):
    #print('{0:2d}: {1}'.format(i, t))

validate()
time_test(10000, 3)

<强>输出

time_test(10000, 3)

num = 6

Validating against parts_combo
parts_combo_set True
parts_gosper True
sub_lists True
amit True

parts_combo
[0.58100390434265137, 0.58798313140869141, 0.59674692153930664]

parts_combo_set
[0.74442911148071289, 0.77211689949035645, 0.79338312149047852]

parts_gosper
[1.0791628360748291, 1.089813232421875, 1.1191768646240234]

sub_lists
[0.77199792861938477, 0.79007697105407715, 0.81944608688354492]

amit
[0.60080099105834961, 0.60345196723937988, 0.60417318344116211]

time_test(1000, 3)

num = 8

Validating against parts_combo
parts_combo_set True
parts_gosper True
sub_lists True
amit True

parts_combo
[0.22465801239013672, 0.22501206398010254, 0.23627114295959473]

parts_combo_set
[0.29469203948974609, 0.29857206344604492, 0.30069589614868164]

parts_gosper
[0.43568992614746094, 0.44395184516906738, 0.44651198387145996]

sub_lists
[0.31375885009765625, 0.32754802703857422, 0.37077498435974121]

amit
[0.22520613670349121, 0.22674393653869629, 0.24075913429260254]

time_test(500, 3)

num = 10

parts_combo
[0.52618098258972168, 0.52645611763000488, 0.53866386413574219]

parts_combo_set
[0.40614008903503418, 0.41370606422424316, 0.41525506973266602]

parts_gosper
[1.0068988800048828, 1.026188850402832, 1.1649439334869385]

sub_lists
[0.48507213592529297, 0.50991582870483398, 0.51528096199035645]

amit
[0.48686790466308594, 0.52898812294006348, 0.68387198448181152]

time_test(100, 3)

num = 12

parts_combo
[0.47471189498901367, 0.47522807121276855, 0.4798729419708252]

parts_combo_set
[0.3045799732208252, 0.30534601211547852, 0.35700607299804688]

parts_gosper
[0.83456206321716309, 0.83824801445007324, 0.87273812294006348]

sub_lists
[0.36697721481323242, 0.36919784545898438, 0.38349604606628418]

amit
[0.40012097358703613, 0.40033888816833496, 0.40788102149963379]

time_test(50, 3)

num = 14

parts_combo
[0.97016000747680664, 0.97931098937988281, 1.2653739452362061]

parts_combo_set
[0.81669902801513672, 0.88839983940124512, 0.91469597816467285]

parts_gosper
[1.772817850112915, 1.9343690872192383, 1.9586498737335205]

sub_lists
[0.78162002563476562, 0.79451298713684082, 0.8126368522644043]

amit
[0.89046502113342285, 0.89572596549987793, 0.91031289100646973]

time_test(50, 3)

num = 16

parts_combo
[4.1981601715087891, 4.3565289974212646, 4.3795731067657471]

parts_combo_set
[2.5452880859375, 2.5757780075073242, 2.6059379577636719]

parts_gosper
[7.5856668949127197, 7.6066100597381592, 7.6397140026092529]

sub_lists
[3.677016019821167, 3.6800520420074463, 3.7420001029968262]

amit
[4.1738030910491943, 4.1768841743469238, 4.1960680484771729]

time_test(10, 3)

num = 18

parts_combo
[3.8362669944763184, 3.8807728290557861, 4.0259079933166504]

parts_combo_set
[1.9355819225311279, 1.9540839195251465, 1.9573280811309814]

parts_gosper
[6.3178229331970215, 6.6125278472900391, 7.0462160110473633]

sub_lists
[2.1632919311523438, 2.238231897354126, 2.2747220993041992]

amit
[3.6137850284576416, 3.6162960529327393, 3.6475629806518555]

time_test(10, 3)

num = 20

parts_combo
[16.874133110046387, 17.585763931274414, 19.725590944290161]

parts_combo_set
[7.5462148189544678, 7.5597691535949707, 7.8375740051269531]

parts_gosper
[27.312526941299438, 27.637516021728516, 28.016690015792847]

sub_lists
[7.7865769863128662, 7.8874318599700928, 8.5498230457305908]

amit
[15.554526805877686, 15.626868009567261, 16.224159002304077]

这些测试是在2GHz单核机器上进行的,其中2 GB的RAM运行Python 2.6.6。