我为层次聚类编写了以下代码,但出现以下错误,您能帮帮我吗?
# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
# Importing the Mall dataset with pandas
dataset =
pd.read_csv("https://raw.githubusercontent.com/akbarhusnoo/Chronic-Kidney-Disease-Prediction/main/chronic_kidney_disease.csv", na_values=["?"])
catCols = dataset.select_dtypes("object").columns
catCols = list(set(catCols))
for i in catCols:
dataset.replace({i: {'?': np.nan}}, regex=False,inplace=True)
dataset.dropna(how='all')
X = dataset.iloc[:, [3,4]].values
# Using the dendrogram to find the optimal number of clusters
import scipy.cluster.hierarchy as sch
dendrogram = sch.dendrogram(sch.linkage(X, method='ward' ))
plt.title('Dendrogram')
plt.xlabel('C')
plt.ylabel('Euclidean distances')
plt.show()
# Fitting the hierarchical clustering to the mall dataset
from sklearn.cluster import AgglomerativeClustering
hc = AgglomerativeClustering(n_clusters=5, affinity = 'euclidean', linkage = 'ward')
Y_hc = hc.fit_predict(X)
# Visualising the clusters
**ValueError Traceback (most recent call last)
<ipython-input-30-2c6a60c0a6d0> in <module>
12
13 import scipy.cluster.hierarchy as sch
---> 14 dendrogram = sch.dendrogram(sch.linkage(X, method='ward' ))
15 plt.title('Dendrogram')
16 plt.xlabel('C')
~\anaconda3\lib\site-packages\scipy\cluster\hierarchy.py in linkage(y, method, metric, optimal_ordering)
1063
1064 if not np.all(np.isfinite(y)):
-> 1065 raise ValueError("The condensed distance matrix must contain only "
1066 "finite values.")
1067
ValueError: The condensed distance matrix must contain only finite values.*
答案 0 :(得分:0)
您的输入数据集中存在问号,这会导致数据集值被读取/解释为字符串而不是整数。
您应该在读取 CSV 后将问号转换为 NaN,或者直接从输入 CSV 文件中删除它们(在 CSV 中留下一个空单元格将被解释为 NaN,因此将所有 ,?,
替换为,,
可以很好地工作)。
完成后,您可以删除带有 NaN 的行。请注意
dropna(how='any')
,not dropna(how='all')
,确保这些行也被删除。dropna()
默认情况下不能就地工作(这是当前版本 Pandas 中大多数操作的默认值)。将结果分配给数据集,或使用 inplace=True
参数。因此,使用
dataset = dataset.dropna('any')
删除带有 NaN 的行时。