我正在寻找一种方法来获取dendrogram图中基于其ClusterNode返回to_tree的群集点的坐标。
使用scipy从以下数据构建树形图:
X = data
Y = pdist(X)
Z = linkage(Y)
dend = dendrogram(Z)
rootnode, nodesList = to_tree(Z, rd=True)
我想要做的是构建一个函数get_coords(somClusterNode)
,它将返回元组(x, y)
,指定节点在图中的位置。
感谢this answer,我设法弄清楚如何从树形图返回值中获取位置,例如:
i, d = list(zip(dend['icoord'], dend['dcoord']))[-1]
x = 0.5 * sum(i[1:3])
y = d[1]
plt.plot(x, y, 'ro')
但是我可以找出nodesList排序和icoord/dcoord
排序之间的关系,以便将一个映射到另一个。
你知道我在哪里寻找吗?
感谢您的帮助!
答案 0 :(得分:2)
每个树形图只映射到一个ClusterNodes树,但任何ClusterNodes树都可以映射到无限数量的树形图。因此,从节点ID到(x,y)位置的映射可能只是树形图数据结构中的另一个字段,而不是ClusterNode的函数。因此,我不是定义函数get_coords
,而是将一个字典附加到dend
,将字节ID映射到(x,y)坐标。您可以使用
x,y = dend['node_id_to_coord'][node_id] # node_id is an integer as returned by ClusterNode.id
代码:
import numpy as np
import matplotlib.pyplot as plt
from scipy.cluster.hierarchy import linkage, dendrogram, to_tree
from scipy.spatial.distance import pdist
# create some random data
X = np.random.rand(10, 3)
# get dendrogram
Z = linkage(pdist(X), method="ward")
dend = dendrogram(Z)
# ----------------------------------------
# get leave coordinates, which are at y == 0
def flatten(l):
return [item for sublist in l for item in sublist]
X = flatten(dend['icoord'])
Y = flatten(dend['dcoord'])
leave_coords = [(x,y) for x,y in zip(X,Y) if y==0]
# in the dendogram data structure,
# leave ids are listed in ascending order according to their x-coordinate
order = np.argsort([x for x,y in leave_coords])
id_to_coord = dict(zip(dend['leaves'], [leave_coords[idx] for idx in order])) # <- main data structure
# ----------------------------------------
# get coordinates of other nodes
# this should work but doesn't:
# # traverse tree from leaves upwards and populate mapping ID -> (x,y);
# # use linkage matrix to traverse the tree optimally
# # (each row in the linkage matrix corresponds to a row in dend['icoord'] and dend['dcoord'])
# root_node, node_list = to_tree(Z, rd=True)
# for ii, (X, Y) in enumerate(zip(dend['icoord'], dend['dcoord'])):
# x = (X[1] + X[2]) / 2
# y = Y[1] # or Y[2]
# node_id = ii + len(dend['leaves'])
# id_to_coord[node_id] = (x, y)
# so we need to do it the hard way:
# map endpoint of each link to coordinates of parent node
children_to_parent_coords = dict()
for i, d in zip(dend['icoord'], dend['dcoord']):
x = (i[1] + i[2]) / 2
y = d[1] # or d[2]
parent_coord = (x, y)
left_coord = (i[0], d[0])
right_coord = (i[-1], d[-1])
children_to_parent_coords[(left_coord, right_coord)] = parent_coord
# traverse tree from leaves upwards and populate mapping ID -> (x,y)
root_node, node_list = to_tree(Z, rd=True)
ids_left = range(len(dend['leaves']), len(node_list))
while len(ids_left) > 0:
for ii, node_id in enumerate(ids_left):
node = node_list[node_id]
if (node.left.id in id_to_coord) and (node.right.id in id_to_coord):
left_coord = id_to_coord[node.left.id]
right_coord = id_to_coord[node.right.id]
id_to_coord[node_id] = children_to_parent_coords[(left_coord, right_coord)]
ids_left = [node_id for node_id in range(len(node_list)) if not node_id in id_to_coord]
# plot result on top of dendrogram
ax = plt.gca()
for node_id, (x, y) in id_to_coord.iteritems():
if not node_list[node_id].is_leaf():
ax.plot(x, y, 'ro')
ax.annotate(str(node_id), (x, y), xytext=(0, -8),
textcoords='offset points',
va='top', ha='center')
dend['node_id_to_coord'] = id_to_coord
答案 1 :(得分:0)
还有另一种方法:
树状图的id似乎是由树的从右到左的反向遍历生成的。这使我们能够构造从node.id
到索引icoord
和dcoord
的转换,如下所示:
def rl_traversal(node):
# skipping leaves
if not node.is_leaf():
yield node.id
yield from rl_traversal(node.right)
yield from rl_traversal(node.left)
id_map = dict(zip( rl_traversal(root), reversed(range(root.get_count()-1))) ))
# id_map[node_id] = dendogram_id
然后可以通过dendo['icoord'][id_map[node_id]]