当我给4个选项时,为什么会看到3个集群? (Python KMeans聚类)

时间:2019-03-29 10:41:44

标签: plot cluster-computing k-means

我正在使用Python将KMeans应用于汽车数据框。从肘部曲线中,我得到4个最佳簇数。我实际上想显示2维的4个群集。我使用PCA技术来执行此操作,下面是该数据框的代码,应用于其的PCA技术等。我的问题是,我试图显示4个群集,但是PCA代码为我显示3个群集,为什么它不显示4个群集?

数据框-cars_df

    cyl disp    hp      wt      acc     mpg     age     group
0   4.0 307.0   130.0   3504.0  12.0    18.0    13.0    0
1   4.0 350.0   165.0   3693.0  11.5    15.0    13.0    0
2   4.0 318.0   150.0   3436.0  11.0    18.0    13.0    0
3   4.0 304.0   150.0   3433.0  12.0    16.0    13.0    0
4   4.0 302.0   140.0   3449.0  10.5    17.0    13.0    0
5   4.0 148.5   93.5    4341.0  10.0    15.0    13.0    0
6   4.0 148.5   93.5    4354.0  15.5    14.0    13.0    0
7   4.0 148.5   93.5    4312.0  15.5    14.0    13.0    0
8   4.0 148.5   93.5    4425.0  10.0    14.0    13.0    0
9   4.0 148.5   93.5    3850.0  15.5    15.0    13.0    0
10  4.0 148.5   93.5    3563.0  10.0    15.0    13.0    0
11  4.0 340.0   160.0   3609.0  15.5    14.0    13.0    0
12  4.0 148.5   150.0   3761.0  15.5    15.0    13.0    0
13  4.0 148.5   93.5    3086.0  10.0    14.0    13.0    0
14  4.0 113.0   95.0    2372.0  15.0    24.0    13.0    1
15  6.0 198.0   95.0    2833.0  15.5    22.0    13.0    3
16  6.0 199.0   97.0    2774.0  15.5    18.0    13.0    3
17  6.0 200.0   85.0    2587.0  16.0    21.0    13.0    3
18  4.0 97.0    88.0    2130.0  14.5    27.0    13.0    1
19  4.0 97.0    46.0    1835.0  20.5    26.0    13.0    1
20  4.0 110.0   87.0    2672.0  17.5    25.0    13.0    1
21  4.0 107.0   90.0    2430.0  14.5    24.0    13.0    1
22  4.0 104.0   95.0    2375.0  17.5    25.0    13.0    1
23  4.0 121.0   113.0   2234.0  12.5    26.0    13.0    1
24  6.0 199.0   90.0    2648.0  15.0    21.0    13.0    3
25  4.0 148.5   93.5    2803.5  14.0    10.0    13.0    0
26  4.0 307.0   93.5    4376.0  15.0    10.0    13.0    0
27  4.0 318.0   93.5    4382.0  13.5    11.0    13.0    0
28  4.0 304.0   93.5    2803.5  18.5    9.0     13.0    3
29  4.0 97.0    88.0    2130.0  14.5    27.0    12.0    1
... ... ... ... ... ... ... ... ...
368 4.0 112.0   88.0    2640.0  18.6    27.0    1.0     2
369 4.0 112.0   88.0    2395.0  18.0    34.0    1.0     2
370 4.0 112.0   85.0    2575.0  16.2    31.0    1.0     2
371 4.0 135.0   84.0    2525.0  16.0    29.0    1.0     2
372 4.0 151.0   90.0    2735.0  18.0    27.0    1.0     2
373 4.0 140.0   92.0    2865.0  16.4    24.0    1.0     2
374 4.0 151.0   93.5    3035.0  20.5    23.0    1.0     2
375 4.0 105.0   74.0    1980.0  15.3    36.0    1.0     2
376 4.0 91.0    68.0    2025.0  18.2    37.0    1.0     2
377 4.0 91.0    68.0    1970.0  17.6    31.0    1.0     2
378 4.0 105.0   63.0    2125.0  14.7    38.0    1.0     2
379 4.0 98.0    70.0    2125.0  17.3    36.0    1.0     2
380 4.0 120.0   88.0    2160.0  14.5    36.0    1.0     2
381 4.0 107.0   75.0    2205.0  14.5    36.0    1.0     2
382 4.0 108.0   70.0    2245.0  16.9    34.0    1.0     2
383 4.0 91.0    67.0    1965.0  15.0    38.0    1.0     2
384 4.0 91.0    67.0    1965.0  15.7    32.0    1.0     2
385 4.0 91.0    67.0    1995.0  16.2    38.0    1.0     2
386 6.0 181.0   110.0   2945.0  16.4    25.0    1.0     2
387 6.0 262.0   85.0    3015.0  17.0    38.0    1.0     2
388 4.0 156.0   92.0    2585.0  14.5    26.0    1.0     2
389 6.0 232.0   112.0   2835.0  14.7    22.0    1.0     3
390 4.0 144.0   96.0    2665.0  13.9    32.0    1.0     2
391 4.0 135.0   84.0    2370.0  13.0    36.0    1.0     2
392 4.0 151.0   90.0    2950.0  17.3    27.0    1.0     2
393 4.0 140.0   86.0    2790.0  15.6    27.0    1.0     2
394 4.0 97.0    52.0    2130.0  15.5    23.0    1.0     2
395 4.0 135.0   84.0    2295.0  11.6    32.0    1.0     2
396 4.0 120.0   79.0    2625.0  18.6    28.0    1.0     2
397 4.0 119.0   82.0    2720.0  19.4    31.0    1.0     2
398 rows × 8 columns

X = cars_df.drop('group', axis = 1)
y = cars_df.pop('group')

X = StandardScaler().fit_transform(X)

#utilizing PCA (Principal Component Analysis)
from sklearn.decomposition import PCA
# Make an instance of the Model
pca = PCA(.95)
pca.fit(X)

num_clusters=4
data2D = pca.transform(X)
centers2D = pca.transform(cluster.cluster_centers_)
labels=cluster.labels_
colors=['#000000','#FFFFFF','#FF0000','#00FF00','#0000FF']
col_map=dict(zip(set(labels),colors))
label_color = [col_map[l] for l in labels]
plt.scatter( data2D[:,0], data2D[:,1], c=label_color) # This plots the 
cluster points.

3 clusters instead of 4

0 个答案:

没有答案