我正在尝试在python中使用kmeans。
data = [[1,2,3,4,5],[1,0,3,2,4],[4,3,234,5,5],[23,4,5,1,4],[23,5,2,3,5]]
每个数据都有一个标签。例如:
[1,2,3,4,5] -> Fiat1
[1,0,3,2,4] -> Fiat2
[4,3,234,5,5] -> Mercedes
[23,4,5,1,4] -> Opel
[23,5,2,3,5] -> bmw
kmeans = KMeans(init='k-means++', n_clusters=3, n_init=10)
kmeans.fit(data)
我的目标是在运行KMeans后,我想获取每个群集的标签。
一个假的例子:
群集1: Fiat1, Fiat2
群集2: 梅赛德斯
群集3: 宝马, 欧宝
我该怎么做?
答案 0 :(得分:1)
from sklearn.cluster import KMeans
import numpy as np
data = np.array([[1,2,3,4,5],[1,0,3,2,4],[4,3,234,5,5],[23,4,5,1,4],[23,5,2,3,5]])
labels = np.array(['Fiat1', 'Fiat2', 'Mercedes', 'Opel', 'BMW'])
N_CLUSTERS = 3
kmeans = KMeans(init='k-means++', n_clusters=N_CLUSTERS, n_init=10)
kmeans.fit(data)
pred_classes = kmeans.predict(data)
for cluster in range(N_CLUSTERS):
print('cluster: ', cluster)
print(labels[np.where(pred_classes == cluster)])
cluster: 0
['Opel' 'BMW']
cluster: 1
['Mercedes']
cluster: 2
['Fiat1' 'Fiat2']
答案 1 :(得分:1)
如果您将标签放在数组中:
implicit val localDateTimeColumnType = MappedColumnType.base[LocalDateTime, Timestamp](
ldt => Timestamp.valueOf(ldt),
t => t.toLocalDateTime
)
然后,
labels=['Fiat1', 'Fiat2', 'Mercedes', 'Opel', 'bmw']
会给你:
n_clusters=3
pred_clusters=kmeans.fit(data).labels_
cluster_labels=[[] for i in range(n_clusters)]
for i, j in enumerate(pred_clusters):
cluster_labels[j].append(labels[i])
这是每个群集中的数据标签列表。