转换kmeans和PCA的数据

时间:2015-06-30 21:09:18

标签: python numpy scipy scikit-learn k-means

我有一个如下所示的数据集:

search_term = ['computer','usb port', 'phone adaptor']
clicks = [3,2,1]
bounce = [0,0,2]
conversion = [4,1,0]

我想将它提供给kmeans模型但是我无法将列表转换为矩阵格式,以便可以通过kmeans获取。我还想用PCA减小尺寸,以便在二维图中可视化。

这就是我的代码:

X = np.array(clicks, bounce, conversion)
y = np.array(search_terms)
num_clusters = 3

pca = PCA(n_components=2, whiten=True).fit(X)
X_pca = pca.transform(X)

km=KMeans(n_clusters=num_clusters, init='k-means++',n_init=10, verbose=1)
km.fit(X_pca)

print km.labels_[:10]

这是我得到的错误:

TypeError: unsupported operand type(s) for -: 'numpy.ndarray' and 'numpy.ndarray'

此外,一旦完成群集,我希望能够看到哪些搜索词属于哪个群集,所以我不确定设置y = np.array(search_terms)是否正确?

请告知。

2 个答案:

答案 0 :(得分:3)

以下代码应该有效。如果情况不是这样,请告诉我。

import numpy as np
from sklearn.decomposition import PCA
from sklearn.cluster import KMeans

search_terms = ['computer','usb port', 'phone adaptor']
clicks = [3,2,1]
bounce = [0,0,2]
conversion = [4,1,0]

X = np.array([clicks, bounce, conversion]).T
y = np.array(search_terms)

num_clusters = 3

X_pca = PCA(n_components=2, whiten=True).fit_transform(X)

km = KMeans(n_clusters=num_clusters, init='k-means++',n_init=10, verbose=1)
km.fit(X_pca)

答案 1 :(得分:0)

你怎么不打这个

>>> X=np.array(clicks,bounce,conversion)
Traceback (most recent call last):
  File "<input>", line 1, in <module>
ValueError: only 2 non-keyword arguments accepted

错误?

我假设您希望每行排列数据项:

X=np.array([clicks,bounce,conversion]).transpose()

如果您想按列添加,请删除.transpose()