Question

我是数据科学的初学者，需要您的帮助我正在尝试使用K-means测试无监督的机器学习但我发现结果不是球形。我归一化，删除了异常值等。我试图找到几种方法来纠正它，但是它不起作用

以下是图片：（我从数据集中抽取了一些样本给您看，实际上是8000行）

enter image description here ...

Answer 1

您的数据有6个维度。您无法直接直观地显示2维以上的数据，需要使用PCA或TSNE对其进行可视化。

Answer 2

from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler

pca = PCA(n_components=2)

principalComponents = pca.fit_transform(df)

principalDf = pd.DataFrame(data = principalComponents, columns = ['principal component 1', 'principal component 2'])

principalDf.head(5)

我使用PCA将6的尺寸减小为2：它将数据完美分离

输出： Output

聚类k均值不是球形

2 个答案: