在Python中绘制集群的质心

时间:2016-11-29 08:29:17

标签: python numpy matplotlib scikit-learn

我是python的初学者,我正在尝试绘制集群的中心,但不能这样做。这是我的代码:

import pandas as pd
import numpy as np

df = pd.read_csv("InputClusterModel.txt")
df.columns = ["Major","Quantity","rating","rating_2","RightWindoWeek","Ranking","CopiesQuant","Content","Trump","Movies","Carton","Serial","Before1014","categor","Purchase","Revenue"]
df.head()

from sklearn.cluster import KMeans

cluster = KMeans(n_clusters=2)

df['cluster'] = cluster.fit_predict(df[df.columns[:15]])

from sklearn.decomposition import PCA
x_cols = df.columns[1:]

pca = PCA()
df['x'] = pca.fit_transform(df[x_cols])[:,0]

df['y'] = pca.fit_transform(df[x_cols])[:,1]

df = df.reset_index()

clusters = df[['Purchase', 'cluster', 'x', 'y']]

clusters.head()

%matplotlib inline
from ggplot import *

ggplot(df, aes(x='x', y='y', color='cluster')) + \
    geom_point(size=75) + \
    ggtitle("Grouped by Cluster")

df.cluster.value_counts()
#after part which below I see mistake:

cluster_centers = pca.transform(cluster.cluster_centers_)
cluster_centers = pd.DataFrame(cluster_centers, columns=['x', 'y'])
cluster_centers['cluster'] = range(0, len(cluster_centers))

ggplot(cluster, aes(x='x', y='y', color='cluster')) + \
    geom_point(size=100) + \
    geom_point(cluster_centers, size=500) +\
    ggtitle("Customers Grouped by Cluster")
print(pca.explained_variance_ratio_)

这是我得到的错误:

ValueError                                Traceback (most recent call
last) <ipython-input-18-c2ac22e32b75> in <module>()
----> 1 cluster_centers = pca.transform(cluster.cluster_centers_)
      2 cluster_centers = pd.DataFrame(cluster_centers, columns=['x', 'y'])
      3 cluster_centers['cluster'] = range(0, len(cluster_centers))
      4 
      5 ggplot(cluster, aes(x='x', y='y', color='cluster')) +     geom_point(size=100) +     geom_point(cluster_centers, size=500) +   
ggtitle("Customers Grouped by Cluster")

/home/belotelov/anaconda2/lib/python2.7/site-packages/sklearn/decomposition/base.pyc
in transform(self, X, y)
    130         X = check_array(X)
    131         if self.mean_ is not None:
--> 132             X = X - self.mean_
    133         X_transformed = fast_dot(X, self.components_.T)
    134         if self.whiten:

ValueError: operands could not be broadcast together with shapes
(2,15) (16,)

我的数据结构如下所示:

  

0,122,7,8,6,8,105.704,1,0,1,0,0,0,0,37426,11831762   1,278,8,8,12,2,2246,1,1,1,0,0,0,0,29316,7371029   1,275,6,6,14,1,1268,1,1,1,0,0,0,0,30693,7368787   0,125,5,5,5,1,105.704,1,0,1,0,0,0,0,20661,7337545   1,193,8,8,11,2,1063,1,1,1,0,0,0,0,29141,7279077   1,1,6,6,11,0,1236,1,1,0,1,0,0,0,879,325151   1,116,8,8,14,0,1209,1,1,0,1,0,0,0,17751,5529657   0,39,7,7,11,1,1128,1,1,1,0,0,0,0,15044,5643468   1,65,6,6,11,0,1209,1,1,0,1,0,0,0,9902,2612669   0,170,6,7,2,0,105.704,1,1,1,0,0,0,0,19167,5195321

P.S。 Debian Jessie上的Python 2.7.12 :: Anaconda自定义(64位)

1 个答案:

答案 0 :(得分:0)

我没有逐行检查您的代码。这是对错误的评论:

  

ValueError:操作数无法与形状一起广播   (2,15)(16,)

如错误所示,您尝试使用两个不兼容的向量广播X = X - self.mean_。广播规则是每个向量的最后维度的轴长度应匹配(此处为15和1)或两者都应为1.

我建议您搜索生成的错误并查看this