按群集,我指的是连接重叠圆圈的组。这张图片可能更好地了解我想要找到的内容:
在我的数据中,圆圈由它们的中心点坐标表示。我已经完成了碰撞检测,以生成表示重叠的成对中心点列表:
pts = [(-2,2), (-2,2), (0,0), (2,1), (6,2), (7,1)]
overlaps = [
(pts[0], pts[1]),
(pts[0], pts[2]),
(pts[1], pts[2]),
(pts[2], pts[3]),
(pts[4], pts[5]),
]
这是预期的结果:
expected_clusters = [
((-2,2), (-2,2), (0,0), (2,1)),
((6,2), (7,1))
]
在实践中,我将使用的数据集大小与此大小相同,因此我可能永远不需要扩展它。但这并不是说我不赞成更优化的解决方案。
我想出了我自己的天真解决方案,我将作为答案发布。但我有兴趣看到其他解决方案。
答案 0 :(得分:3)
您所做的不是群集分析,而是connected component分析。聚类分析将采用一大堆个别点并试图发现圈子。但您可能会感兴趣的是,通过重叠邻域将点分配到初始邻域和基于聚类的可达性的组合是DBSCAN想法及其density-based clustering变体的核心。
在任何情况下,由于您从圈子开始,一旦完成了碰撞检测,您调用重叠列表的是邻接列表,而您调用群集的是连接组件。该算法非常简单:
L
。 Cs
L
不为空:
N
C
N
C
C
附加到Cs
C
L
中的所有节点
答案 1 :(得分:1)
编辑原始回复,支持acjohnson55's algorithm:
center_pts = [(-2,2), (-2,2), (0,0), (2,1), (6,2), (7,1)]
overlapping_circle_pts = [
(center_pts[0], center_pts[1]),
(center_pts[0], center_pts[2]),
(center_pts[1], center_pts[2]),
(center_pts[2], center_pts[3]),
(center_pts[4], center_pts[5]),
]
expected_solution = [
[(-2,2), (-2,2), (0,0), (2,1)],
[(6,2), (7,1)]
]
def cluster_overlaps(nodes, adjacency_list):
clusters = []
nodes = list(nodes) # make sure we're mutating a copy
while len(nodes):
node = nodes[0]
path = dfs(node, adjacency_list, nodes)
# append path to connected_nodes
clusters.append(path)
# remove all nodes from
for pt in path:
nodes.remove(pt)
return clusters
def dfs(start, adjacency_list, nodes):
"""ref: http://code.activestate.com/recipes/576723/"""
path = []
q = [start]
while q:
node = q.pop(0)
# cycle detection
if path.count(node) >= nodes.count(node):
continue
path = path + [node]
# get next nodes
next_nodes = [p2 for p1,p2 in adjacency_list if p1 == node]
q = next_nodes + q
return path
print cluster_overlaps(center_pts, overlapping_circle_pts)