Question

我正在尝试设计一种有效的算法来对任意长度的整数元组序列进行分组，例如：

[(), (1,), (1,1), (1,2), (2,), (2,1,1), (2,1,2), (2,2)]

例如，在Python中，分组规则如下：

def tupleSameGroup(tuple1, tuple2):
    sameGroup = True
    for index in range(min(len(tuple1), len(tuple2))):
        if tuple1[index] != tuple2[index]:
            sameGroup = False

    return sameGroup

粗略地说，如果一个元组从一开始就是另一个匹配项的“子集”，则它们是同一组。空元组与任何元组在同一组中。

基于此规则，我希望我的算法生成所有唯一元组组的列表作为输出；因此，有一个元组列表的列表，其中在内部列表中，元组都在同一组中，但是在它们之间有一对不是。对于上面的示例，所需的输出是：

[[(), (1,), (1,1)],
 [(), (1,), (1,2)],
 [(), (2,), (2,1,1)],
 [(), (2,), (2,1,2)],
 [(), (2,), (2,2)]]

任何帮助将不胜感激！谢谢。

Answer 1

您可以分两步执行此操作：首先，建立元组的Trie或前缀树：

tuples = set([(), (1,), (1,1), (1,2), (2,), (2,1,1), (2,1,2), (2,2)])

tree = {}
for tpl in tuples:
    t = tree
    for x in tpl:
        t = t.setdefault(x, {})

在您的示例中，tree将是{1: {1: {}, 2: {}}, 2: {1: {1: {}, 2: {}}, 2: {}}}

然后，将DFS放入树中，并在当前元组（树中的路径）位于set的{{1}}（为了更快查找）中将元素添加到组中。（树中的叶子始终是有效的元组。）

tuples

这将产生：

def find_groups(tree, path):
    if len(tree) == 0:
        yield [path]
    for x in tree:
        for res in find_groups(tree[x], path + (x,)):
            yield [path] + res if path in tuples else res

复杂度应为O（k），其中k是所有元组中元素的总和，即树中中间节点和叶节点的总数。

Answer 2

这不是最有效的解决方案，但这将产生所需的输出，并且可以随着最大元组大小的增加而工作：

s = [(), (1,), (1,1), (1,2), (2,), (2,1,1), (2,1,2), (2,2)]

def tupleSameGroup(tuple1, tuple2, sameGroup=True):

    if any(tuple1[idx]!=tuple2[idx] for idx in range(len(tuple1))):
        return False
    return sameGroup

groups = [[i, j] for i in s for j in [x for x in s if len(x)>len(i)] if tupleSameGroup(i, j)]

收益：

[[(), (1,)], [(), (1, 1)], [(), (1, 2)], [(), (2,)], [(), (2, 1, 1)], [(), (2, 1, 2)], [(), (2, 2)], [(1,), (1, 1)], [(1,), (1, 2)], [(2,), (2, 1, 1)], [(2,), (2, 1, 2)], [(2,), (2, 2)]]

然后您可以根据常见元素将这些组组合在一起：

combined_groups = [sorted(list(set(i) | set(j))) for i in groups for j in groups if i[-1] in j and i!=j]

收益：

[[(), (1,), (1, 1)], [(), (1,), (1, 2)], [(), (1,), (1, 1)], [(), (1,), (1, 2)], [(), (2,), (2, 1, 1)], [(), (2,), (2, 1, 2)], [(), (2,), (2, 2)], [(), (2,), (2, 1, 1)], [(), (2,), (2, 1, 2)], [(), (2,), (2, 2)], [(), (1,), (1, 1)], [(), (1,), (1, 2)], [(), (2,), (2, 1, 1)], [(), (2,), (2, 1, 2)], [(), (2,), (2, 2)]]

最后，我们可以创建一个没有任何重复的新列表：

no_duplicates = []
for i in combined_groups:
    if i not in no_duplicates:
        no_duplicates.append(i)

收益：

[[(), (1,), (1, 1)],
 [(), (1,), (1, 2)],
 [(), (2,), (2, 1, 1)],
 [(), (2,), (2, 1, 2)],
 [(), (2,), (2, 2)]]

分组可变长度元组的高效算法

2 个答案: