我尝试使树状图与聚集层次聚类相关联,并且我需要距离矩阵。我开始:
import numpy as np
import pandas as pd
from scipy import ndimage
from scipy.cluster import hierarchy
from scipy.spatial import distance_matrix
from matplotlib import pyplot as plt
from sklearn import manifold, datasets
from sklearn.cluster import AgglomerativeClustering
from sklearn.datasets.samples_generator import make_blobs
%matplotlib inline
X1, y1 = make_blobs(n_samples=50, centers=[[4,4], [-2, -1], [1, 1], [10,4]], cluster_std=0.9)
plt.scatter(X1[:, 0], X1[:, 1], marker='o')
agglom = AgglomerativeClustering(n_clusters = 4, linkage = 'average')
agglom.fit(X1,y1)
# Create a figure of size 6 inches by 4 inches.
plt.figure(figsize=(6,4))
# These two lines of code are used to scale the data points down,
# Or else the data points will be scattered very far apart.
# Create a minimum and maximum range of X1.
x_min, x_max = np.min(X1, axis=0), np.max(X1, axis=0)
# Get the average distance for X1.
X1 = (X1 - x_min) / (x_max - x_min)
# This loop displays all of the datapoints.
for i in range(X1.shape[0]):
# Replace the data points with their respective cluster value
# (ex. 0) and is color coded with a colormap (plt.cm.spectral)
plt.text(X1[i, 0], X1[i, 1], str(y1[i]),
color=plt.cm.nipy_spectral(agglom.labels_[i] / 10.),
fontdict={'weight': 'bold', 'size': 9})
# Remove the x ticks, y ticks, x and y axis
plt.xticks([])
plt.yticks([])
#plt.axis('off')
# Display the plot of the original data before clustering
plt.scatter(X1[:, 0], X1[:, 1], marker='.')
# Display the plot
plt.show()
dist_matrix = distance_matrix(X1,X1)
print(dist_matrix)
写此代码时出现错误:
Z = hierarchy.linkage(dist_matrix, 'complete')
/home/jupyterlab/conda/envs/python/lib/python3.6/site-packages/ipykernel_launcher.py:1:ClusterWarning:scipy.cluster:对称的非负空心观察矩阵看起来像可疑的不凝聚距离矩阵 “”“启动IPython内核的入口点。
首先,这是什么意思,我该如何解决?谢谢
答案 0 :(得分:1)
这意味着X1与
中的X1.T太近了agglom.fit(X1,y1)
您可以在标题中添加以下代码以忽略它!
from scipy.cluster.hierarchy import ClusterWarning
from warnings import simplefilter
simplefilter("ignore", ClusterWarning)
答案 1 :(得分:0)
scipy.cluster.heirarchy.linkage
需要一个 压缩 距离矩阵,而不是 squareform/uncondensed 距离矩阵。您已经计算了一个平方距离矩阵,需要将其转换为压缩形式。我建议使用 scipy.spatial.distance.squareform
。以下剪下的内容在没有警告的情况下重现了您的功能(为简洁起见,我已删除了绘图)。
from sklearn.cluster import AgglomerativeClustering
from sklearn.datasets import make_blobs
from scipy.spatial import distance_matrix
from scipy.cluster import hierarchy
from scipy.spatial.distance import squareform
X1, y1 = make_blobs(n_samples=50, centers=[[4,4],
[-2, -1],
[1, 1],
[10,4]], cluster_std=0.9)
agglom = AgglomerativeClustering(n_clusters = 4, linkage = 'average')
agglom.fit(X1,y1)
dist_matrix = distance_matrix(X1,X1)
print(dist_matrix.shape)
condensed_dist_matrix = squareform(dist_matrix)
print(condensed_dist_matrix.shape)
Z = hierarchy.linkage(condensed_dist_matrix, 'complete')