Question

我正在寻找一种使用python在n个簇中分割2D数组的方法。我想使用K均值方法，但我没有找到任何代码。我尝试使用sklearn库的k-means，但我还没有理解如何正确使用它。

Answer 1

通常，要使用sklearn中的模型，您必须：

导入它：from sklearn.cluster import KMeans
使用所选参数kmeans = KMeans(n_clusters=2)初始化表示模型的对象，作为示例。
使用.fit()方法kmeans.fit(points)对您的数据进行训练。现在，对象kmeans在其属性中包含与训练模型相关的所有数据。例如，kmeans.labels_对应于一个数组，其中包含用于训练模型的每个点的标签。
使用.predict(new_points)方法将最近的群集的标签转换为点或点阵列。

您可以从kmeansalgorithm页面获取所有属性： http://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html

Answer 2

来自http://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html#sklearn.cluster.KMeans

from sklearn.cluster import KMeans
import numpy as np

#this is your array with the values
X = np.array([[1, 2], [1, 4], [1, 0],
               [4, 2], [4, 4], [4, 0]])


#This function creates the classifier
#n_clusters is the number of clusters you want to use to classify your data
kmeans = KMeans(n_clusters=2, random_state=0).fit(X)

#you can see the labels with:
print kmeans.labels_

# the output will be something like:
#array([0, 0, 0, 1, 1, 1], dtype=int32)
# the values (0,1) tell you to what cluster does every of your data points correspond to

#You can predict new points with
kmeans.predict([[0, 0], [4, 4]])

#array([0, 1], dtype=int32)

#or see were the centres of your clusters are
kmeans.cluster_centers_
#array([[ 1.,  2.],
#     [ 4.,  2.]])

Python K意味着集群阵列

2 个答案: