查找列表中的所有键簇

时间:2019-03-01 12:37:44

标签: python python-3.x

我遇到一个“组合”问题,试图找到一组不同的键,为此我尝试寻找一种优化的解决方案:

我有列表“ l”的列表:

public List<EmployeeEntity> GetEmployeeList(EmployeeEntity employee)
{
   ParameterContext paramList = new ParameterContext();
   paramList.Add("@Param1", employee.EmployeeId);           
}

每个ID都链接到另一个ID,但也可能通过另一个键链接到另一个键(请参见下图)。目标是以一种优化的方式找到属于同一集群的所有键。

enter image description here

想要的结果是:

l = [[1, 5],
 [5, 7],
 [4, 9],
 [7, 9],
 [50, 90],
 [100, 200],
 [90, 100],
 [2, 90],
 [7, 50],
 [9, 21],
 [5, 10],
 [8, 17],
 [11, 15],
 [3, 11]]

我当前拥有的代码是:

[{1, 2, 4, 5, 7, 9, 10, 21, 50, 90, 100, 200}, {8, 17}, {3, 11, 15}]

我得到先前显示的结果。在200万条需要花很长时间才能运行的线路上使用它时,问题就来了。

还有其他方法可以以优化的方式解决此问题吗?

2 个答案:

答案 0 :(得分:3)

您可以将其视为在图形中找到connected components的问题:

l = [[1, 5], [5, 7], [4, 9], [7, 9], [50, 90], [100, 200], [90, 100],
     [2, 90], [7, 50], [9, 21], [5, 10], [8, 17], [11, 15], [3, 11]]
# Make graph-like dict
graph = {}
for i1, i2 in l:
    graph.setdefault(i1, set()).add(i2)
    graph.setdefault(i2, set()).add(i1)
# Find clusters
clusters = []
for start, ends in graph.items():
    # If vertex is already in a cluster skip
    if any(start in cluster for cluster in clusters):
        continue
    # Cluster set
    cluster = {start}
    # Process neighbors transitively
    queue = list(ends)
    while queue:
        v = queue.pop()
        # If vertex is new
        if v not in cluster:
            # Add it to cluster and put neighbors in queue
            cluster.add(v)
            queue.extend(graph[v])
    # Save cluster
    clusters.append(cluster)
print(*clusters)
# {1, 2, 100, 5, 4, 7, 200, 9, 10, 50, 21, 90} {8, 17} {3, 11, 15}

答案 1 :(得分:2)

这是union-find algorithm / disjoint set data structure的典型用例。 Python库AFAIK中没有实现,但是我总是倾向于在附近创建一个,因为它是如此有用...

l = [[1, 5], [5, 7], [4, 9], [7, 9], [50, 90], [100, 200], [90, 100],
 [2, 90], [7, 50], [9, 21], [5, 10], [8, 17], [11, 15], [3, 11]]

from collections import defaultdict
leaders = defaultdict(lambda: None)

def find(x):
    l = leaders[x]
    if l is not None:
        leaders[x] = find(l)
        return leaders[x]
    return x

# union all elements that transitively belong together
for a, b in l:
    leaders[find(a)] = find(b)

# get groups of elements with the same leader
groups = defaultdict(set)
for x in leaders:
    groups[find(x)].add(x)
print(*groups.values())
# {1, 2, 4, 5, 100, 7, 200, 9, 10, 50, 21, 90} {8, 17} {3, 11, 15}

n个节点的运行时复杂度应约为O(nlogn),每次都需要登录步骤才能到达(和更新)领导者。