如何从术语文档(tf_idf)矩阵制作弯头曲线?

时间:2017-04-04 09:03:28

标签: python jupyter-notebook k-means tf-idf

我尝试了以下代码来获取tfidf_matrix(这是一个术语文档频率矩阵)的肘曲线。但是我收到的错误如图所示。 [![在此输入图片说明] [1]] [1] 可以做些什么来解决这个问题?

from scipy.spatial.distance import cdist, pdist
from sklearn.cluster import KMeans

K = range(1,50)
KM = [KMeans(n_clusters=k).fit(tfidf_matrix) for k in K]
centroids = [k.cluster_centers_ for k in KM]

D_k = [cdist(tfidf_matrix, cent, 'euclidian') for cent in centroids]
cIdx = [np.argmin(D,axis=1) for D in D_k]
dist = [np.min(D,axis=1) for D in D_k]
avgWithinSS = [sum(d)/tfidf_matrix.shape[0] for d in dist]

# Total with-in sum of square
wcss = [sum(d**2) for d in dist]
tss = sum(pdist(tfidf_matrix)**2)/dt_trans.shape[0]
bss = tss-wcss

kIdx = 10-1

tfidf_matrix是我们从文档中获得的术语文档频率矩阵。 This is the error

0 个答案:

没有答案