https://archive.ics.uci.edu/ml/datasets/Auto+MPG 我有这个数据集,我已经修复了缺失的值,并对数据进行了归一化。如何使用k-means?到目前为止,我发现的所有内容都是针对两个变量的。
答案 0 :(得分:0)
您可以使用scikit-learn进行k均值聚类。请参阅以下代码以了解如何实现。
from sklearn.cluster import KMeans
# ---------- DATA ----------------
import numpy as np
np.random.seed(0)
# generated training data
data = np.random.randint(1, 1000, size=(500, 25)) # data has 500 samples with 25 dim each
# testing data
test_data = np.random.randint(1, 1000, size=(10, 25)) # test_data has 10 samples with 25 dim each
# --------------------------------
# using KMean clustering from scikit-learn for training
kmeans = KMeans(n_clusters=16, random_state=0).fit(data) # creating 16 clusters with the data
# labels for your clusters
kmean_labels = kmeans.labels_
# Predict the closest cluster for each sample
predicted_labels = kmeans.predict(test_data)
有关更多详细信息,请参阅this link。