Question

按群集，我指的是连接重叠圆圈的组。这张图片可能更好地了解我想要找到的内容：

enter image description here

在我的数据中，圆圈由它们的中心点坐标表示。我已经完成了碰撞检测，以生成表示重叠的成对中心点列表：

pts = [(-2,2), (-2,2), (0,0), (2,1), (6,2), (7,1)]

overlaps = [
    (pts[0], pts[1]),
    (pts[0], pts[2]),
    (pts[1], pts[2]),
    (pts[2], pts[3]),
    (pts[4], pts[5]),
]

这是预期的结果：

expected_clusters = [
    ((-2,2), (-2,2), (0,0), (2,1)),
    ((6,2), (7,1))
]

在实践中，我将使用的数据集大小与此大小相同，因此我可能永远不需要扩展它。但这并不是说我不赞成更优化的解决方案。

我想出了我自己的天真解决方案，我将作为答案发布。但我有兴趣看到其他解决方案。

Answer 1

您所做的不是群集分析，而是connected component分析。聚类分析将采用一大堆个别点并试图发现圈子。但您可能会感兴趣的是，通过重叠邻域将点分配到初始邻域和基于聚类的可达性的组合是DBSCAN想法及其density-based clustering变体的核心。

在任何情况下，由于您从圈子开始，一旦完成了碰撞检测，您调用重叠列表的是邻接列表，而您调用群集的是连接组件。该算法非常简单：

创建所有节点的列表L。
创建连接组件的空列表Cs
虽然L不为空：
1. 选择一个任意节点N
2. 创建已使用C
3. 使用您的邻接列表进行广度优先或深度优先遍历，将您遇到的每个节点添加到C
4. 将C附加到Cs
5. 从C

Answer 2

编辑原始回复，支持acjohnson55's algorithm：

center_pts = [(-2,2), (-2,2), (0,0), (2,1), (6,2), (7,1)]

overlapping_circle_pts = [
    (center_pts[0], center_pts[1]),
    (center_pts[0], center_pts[2]),
    (center_pts[1], center_pts[2]),
    (center_pts[2], center_pts[3]),
    (center_pts[4], center_pts[5]),
]

expected_solution = [
    [(-2,2), (-2,2), (0,0), (2,1)],
    [(6,2), (7,1)]
]


def cluster_overlaps(nodes, adjacency_list):
    clusters = []
    nodes = list(nodes)  # make sure we're mutating a copy

    while len(nodes):
        node = nodes[0]
        path = dfs(node, adjacency_list, nodes)

        # append path to connected_nodes
        clusters.append(path)

        # remove all nodes from
        for pt in path:
            nodes.remove(pt)

    return clusters


def dfs(start, adjacency_list, nodes):
    """ref: http://code.activestate.com/recipes/576723/"""
    path = []
    q = [start]

    while q:
        node = q.pop(0)

        # cycle detection
        if path.count(node) >= nodes.count(node):
            continue

        path = path + [node]

        # get next nodes
        next_nodes = [p2 for p1,p2 in adjacency_list if p1 == node]
        q = next_nodes + q

    return path

print cluster_overlaps(center_pts, overlapping_circle_pts)

在Python中，如何识别重叠圆的簇？

2 个答案: