Question

https://archive.ics.uci.edu/ml/datasets/Auto+MPG 我有这个数据集，我已经修复了缺失的值，并对数据进行了归一化。如何使用k-means？到目前为止，我发现的所有内容都是针对两个变量的。

Answer 1

您可以使用scikit-learn进行k均值聚类。请参阅以下代码以了解如何实现。

from sklearn.cluster import KMeans

# ---------- DATA ----------------
import numpy as np
np.random.seed(0)

# generated training data 
data = np.random.randint(1, 1000, size=(500, 25)) # data has 500 samples with 25 dim each

# testing data
test_data = np.random.randint(1, 1000, size=(10, 25)) # test_data has 10 samples with 25 dim each
# --------------------------------

# using KMean clustering from scikit-learn for training
kmeans = KMeans(n_clusters=16, random_state=0).fit(data)  # creating 16 clusters with the data

# labels for your clusters
kmean_labels = kmeans.labels_

# Predict the closest cluster for each sample
predicted_labels = kmeans.predict(test_data)

有关更多详细信息，请参阅this link。

如何在多变量数据集中实现k均值？

1 个答案: