结合一对(元组)列表?

时间:2016-10-07 10:46:19

标签: python list

从链接对的列表中,我想将这些对组合成公共ID组,以便我可以将group_ids写回数据库,例如:

UPDATE table SET group = n WHERE id IN (...........);

示例:

[(1,2), (3, 4), (1, 5), (6, 3), (7, 8)]

变为

[[1, 2, 5], [3, 4, 6], [7, 8]]

允许:

UPDATE table SET group = 1 WHERE id IN (1, 2, 5);
UPDATE table SET group = 2 WHERE id IN (3, 4, 6);
UPDATE table SET group = 3 WHERE id IN (7, 8);

[(1,2), (3, 4), (1, 5), (6, 3), (7, 8), (5, 3)]

变为

[[1, 2, 5, 3, 4, 6], [7, 8]]

允许:

UPDATE table SET group = 1 WHERE id IN (1, 2, 5, 3, 4, 6);
UPDATE table SET group = 2 WHERE id IN (7, 8);

我写了一些有用的代码。我传入一个元组列表,其中每个元组都是一对链接的ID。我返回一个列表列表,其中每个内部列表都是一个公共id的列表。

我遍历元组列表并将每个元组元素分配给组,如下所示:

  • 如果a和b都不在列表中,则创建一个新列表,附加a和b并将新列表附加到列表列表
  • 如果a在一个组中但b不在,则将b添加到a组
  • 如果b在一个组中但a不在,则添加到b组
  • 如果a和b已经在单独的组中,则合并a和b组
  • 如果a和b已经在同一组中,则不执行任何操作

我期待着数以百万计的关联对,我期待成千上万的团体成员中有数十万的gropus和hunderds。所以,我需要快速的算法,我正在寻找一些真正有效的代码的建议。虽然我已经实现了这个来构建列表列表,但我对任何事情都持开放态度,关键是能够构建上面的SQL以使组ID返回数据库。

def group_pairs(list_of_pairs):
    """

    :param list_of_pairs:
    :return:
    """
    groups = list()
    for pair in list_of_pairs:
        a_group = None
        b_group = None

        for group in groups:
            # find what group if any a and b belong to

            # don't bother checking if a group already found
            if a_group is None and pair[0] in group:
                a_group = group
            # don't bother checking if b group already found
            if b_group is None and pair[1] in group:
                b_group = group
            # if a and b found, stop looking
            if a_group is not None and b_group is not None:
                break

        if a_group is None:
            if b_group is None:
                # neither a nor b are in a group; create a new group and
                # add a and b
                groups.append([pair[0], pair[1]])
            else:
                # b is in a group but a isn't, so add a to the b group
                b_group.append(pair[0])
        elif a_group != b_group:
            if b_group is None:
                # a is in a group but b isn't, so add b to the a group
                a_group.append(pair[1])
            else:
                # a and b are in different groups, add join b to a and get
                # rid of b
                a_group.extend(b_group)
                groups.remove(b_group)
        elif a_group == b_group:
            # a and b already in same group, so nothing to do
            pass

    return groups

1 个答案:

答案 0 :(得分:3)

使用:

def make_equiv_classes(pairs):
    groups = {}
    for (x, y) in pairs:
        xset = groups.get(x, set([x]))
        yset = groups.get(y, set([y]))
        jset = xset | yset
        for z in jset:
            groups[z] = jset
    return set(map(tuple, groups.values()))

你可以得到:

>>> make_equiv_classes([(1,2), (3, 4), (1, 5), (6, 3), (7, 8)])
{(1, 2, 5), (3, 4, 6), (8, 7)}

>>> make_equiv_classes([(1,2), (3, 4), (1, 5), (6, 3), (7, 8), (5, 3)])
{(1, 2, 3, 4, 5, 6), (8, 7)}

复杂性应该是 O(np),其中 n 是不同整数值的数量, p 是对的数量

我认为set是单个组的正确类型,因为它使联合操作快速且易于表达,dict是存储groups的正确方法,因为您获取常量时间查找以询问特定整数值属于哪个组的问题。

如果我们愿意,我们可以设置测试工具来计算此代码的时间。首先,我们可以在适度大的东西上构建随机图,例如10K节点(即,不同的整数值)。我将放入5K随机链接(对),因为这往往会给我数千个组,它们共占大约三分之二的节点(也就是说,大约3K节点将在单个组中,而不是链接其他任何事情)。

import random
pairs = []
while len(pairs) < 5000:
    a = random.randint(1,10000)
    b = random.randint(1,10000)
    if a != b:
        pairs.append((a,b))

然后,我们可以计算时间(我在这里使用IPython魔术):

In [48]: %timeit c = make_equiv_classes(pairs)
10 loops, best of 3: 63 ms per loop

比初始解决方案更快:

In [49]: %timeit c = group_pairs(pairs)
1 loop, best of 3: 2.08 s per loop

我们也可以使用这个随机图来检查两个函数的输出对于某些大型随机输入是否相同:

>>> c = make_equiv_classes(pairs)
>>> c2 = group_pairs(pairs)
>>> set(tuple(sorted(x)) for x in c) == set(tuple(sorted(x)) for x in c2)
True