从链接对的列表中,我想将这些对组合成公共ID组,以便我可以将group_ids写回数据库,例如:
UPDATE table SET group = n WHERE id IN (...........);
示例:
[(1,2), (3, 4), (1, 5), (6, 3), (7, 8)]
变为
[[1, 2, 5], [3, 4, 6], [7, 8]]
允许:
UPDATE table SET group = 1 WHERE id IN (1, 2, 5);
UPDATE table SET group = 2 WHERE id IN (3, 4, 6);
UPDATE table SET group = 3 WHERE id IN (7, 8);
和
[(1,2), (3, 4), (1, 5), (6, 3), (7, 8), (5, 3)]
变为
[[1, 2, 5, 3, 4, 6], [7, 8]]
允许:
UPDATE table SET group = 1 WHERE id IN (1, 2, 5, 3, 4, 6);
UPDATE table SET group = 2 WHERE id IN (7, 8);
我写了一些有用的代码。我传入一个元组列表,其中每个元组都是一对链接的ID。我返回一个列表列表,其中每个内部列表都是一个公共id的列表。
我遍历元组列表并将每个元组元素分配给组,如下所示:
我期待着数以百万计的关联对,我期待成千上万的团体成员中有数十万的gropus和hunderds。所以,我需要快速的算法,我正在寻找一些真正有效的代码的建议。虽然我已经实现了这个来构建列表列表,但我对任何事情都持开放态度,关键是能够构建上面的SQL以使组ID返回数据库。
def group_pairs(list_of_pairs):
"""
:param list_of_pairs:
:return:
"""
groups = list()
for pair in list_of_pairs:
a_group = None
b_group = None
for group in groups:
# find what group if any a and b belong to
# don't bother checking if a group already found
if a_group is None and pair[0] in group:
a_group = group
# don't bother checking if b group already found
if b_group is None and pair[1] in group:
b_group = group
# if a and b found, stop looking
if a_group is not None and b_group is not None:
break
if a_group is None:
if b_group is None:
# neither a nor b are in a group; create a new group and
# add a and b
groups.append([pair[0], pair[1]])
else:
# b is in a group but a isn't, so add a to the b group
b_group.append(pair[0])
elif a_group != b_group:
if b_group is None:
# a is in a group but b isn't, so add b to the a group
a_group.append(pair[1])
else:
# a and b are in different groups, add join b to a and get
# rid of b
a_group.extend(b_group)
groups.remove(b_group)
elif a_group == b_group:
# a and b already in same group, so nothing to do
pass
return groups
答案 0 :(得分:3)
使用:
def make_equiv_classes(pairs):
groups = {}
for (x, y) in pairs:
xset = groups.get(x, set([x]))
yset = groups.get(y, set([y]))
jset = xset | yset
for z in jset:
groups[z] = jset
return set(map(tuple, groups.values()))
你可以得到:
>>> make_equiv_classes([(1,2), (3, 4), (1, 5), (6, 3), (7, 8)])
{(1, 2, 5), (3, 4, 6), (8, 7)}
>>> make_equiv_classes([(1,2), (3, 4), (1, 5), (6, 3), (7, 8), (5, 3)])
{(1, 2, 3, 4, 5, 6), (8, 7)}
复杂性应该是 O(np),其中 n 是不同整数值的数量, p 是对的数量
我认为set
是单个组的正确类型,因为它使联合操作快速且易于表达,dict
是存储groups
的正确方法,因为您获取常量时间查找以询问特定整数值属于哪个组的问题。
如果我们愿意,我们可以设置测试工具来计算此代码的时间。首先,我们可以在适度大的东西上构建随机图,例如10K节点(即,不同的整数值)。我将放入5K随机链接(对),因为这往往会给我数千个组,它们共占大约三分之二的节点(也就是说,大约3K节点将在单个组中,而不是链接其他任何事情)。
import random
pairs = []
while len(pairs) < 5000:
a = random.randint(1,10000)
b = random.randint(1,10000)
if a != b:
pairs.append((a,b))
然后,我们可以计算时间(我在这里使用IPython魔术):
In [48]: %timeit c = make_equiv_classes(pairs)
10 loops, best of 3: 63 ms per loop
比初始解决方案更快:
In [49]: %timeit c = group_pairs(pairs)
1 loop, best of 3: 2.08 s per loop
我们也可以使用这个随机图来检查两个函数的输出对于某些大型随机输入是否相同:
>>> c = make_equiv_classes(pairs)
>>> c2 = group_pairs(pairs)
>>> set(tuple(sorted(x)) for x in c) == set(tuple(sorted(x)) for x in c2)
True