我是python的初学者,我正在尝试绘制集群的中心,但不能这样做。这是我的代码:
import pandas as pd
import numpy as np
df = pd.read_csv("InputClusterModel.txt")
df.columns = ["Major","Quantity","rating","rating_2","RightWindoWeek","Ranking","CopiesQuant","Content","Trump","Movies","Carton","Serial","Before1014","categor","Purchase","Revenue"]
df.head()
from sklearn.cluster import KMeans
cluster = KMeans(n_clusters=2)
df['cluster'] = cluster.fit_predict(df[df.columns[:15]])
from sklearn.decomposition import PCA
x_cols = df.columns[1:]
pca = PCA()
df['x'] = pca.fit_transform(df[x_cols])[:,0]
df['y'] = pca.fit_transform(df[x_cols])[:,1]
df = df.reset_index()
clusters = df[['Purchase', 'cluster', 'x', 'y']]
clusters.head()
%matplotlib inline
from ggplot import *
ggplot(df, aes(x='x', y='y', color='cluster')) + \
geom_point(size=75) + \
ggtitle("Grouped by Cluster")
df.cluster.value_counts()
#after part which below I see mistake:
cluster_centers = pca.transform(cluster.cluster_centers_)
cluster_centers = pd.DataFrame(cluster_centers, columns=['x', 'y'])
cluster_centers['cluster'] = range(0, len(cluster_centers))
ggplot(cluster, aes(x='x', y='y', color='cluster')) + \
geom_point(size=100) + \
geom_point(cluster_centers, size=500) +\
ggtitle("Customers Grouped by Cluster")
print(pca.explained_variance_ratio_)
这是我得到的错误:
ValueError Traceback (most recent call
last) <ipython-input-18-c2ac22e32b75> in <module>()
----> 1 cluster_centers = pca.transform(cluster.cluster_centers_)
2 cluster_centers = pd.DataFrame(cluster_centers, columns=['x', 'y'])
3 cluster_centers['cluster'] = range(0, len(cluster_centers))
4
5 ggplot(cluster, aes(x='x', y='y', color='cluster')) + geom_point(size=100) + geom_point(cluster_centers, size=500) +
ggtitle("Customers Grouped by Cluster")
/home/belotelov/anaconda2/lib/python2.7/site-packages/sklearn/decomposition/base.pyc
in transform(self, X, y)
130 X = check_array(X)
131 if self.mean_ is not None:
--> 132 X = X - self.mean_
133 X_transformed = fast_dot(X, self.components_.T)
134 if self.whiten:
ValueError: operands could not be broadcast together with shapes
(2,15) (16,)
我的数据结构如下所示:
0,122,7,8,6,8,105.704,1,0,1,0,0,0,0,37426,11831762 1,278,8,8,12,2,2246,1,1,1,0,0,0,0,29316,7371029 1,275,6,6,14,1,1268,1,1,1,0,0,0,0,30693,7368787 0,125,5,5,5,1,105.704,1,0,1,0,0,0,0,20661,7337545 1,193,8,8,11,2,1063,1,1,1,0,0,0,0,29141,7279077 1,1,6,6,11,0,1236,1,1,0,1,0,0,0,879,325151 1,116,8,8,14,0,1209,1,1,0,1,0,0,0,17751,5529657 0,39,7,7,11,1,1128,1,1,1,0,0,0,0,15044,5643468 1,65,6,6,11,0,1209,1,1,0,1,0,0,0,9902,2612669 0,170,6,7,2,0,105.704,1,1,1,0,0,0,0,19167,5195321
P.S。 Debian Jessie上的Python 2.7.12 :: Anaconda自定义(64位)
答案 0 :(得分:0)
我没有逐行检查您的代码。这是对错误的评论:
ValueError:操作数无法与形状一起广播 (2,15)(16,)
如错误所示,您尝试使用两个不兼容的向量广播X = X - self.mean_
。广播规则是每个向量的最后维度的轴长度应匹配(此处为15和1)或两者都应为1.
我建议您搜索生成的错误并查看this