我正试图从他们的质心确定我的文件的欧几里德距离。有问题的两个数组(points
和centers
)的维度满足XA
的{{1}}和XB
维度要求,但我不知道为什么我得到以下scipy.spatial.distance.cdist
。
我的代码:
ValueError
这是我得到的错误:
import pandas as pd, numpy as np
from scipy.spatial.distance import cdist
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.cluster import KMeans
corpus = pd.Series(["bye bye brutal good bye apple banana orange", "bye bye hello apple banana", "corn wheat apple banana goodbye cookie brutal", "fruit cake banana apple bye sweet sweet"])
X = vectorizer.fit_transform(corpus)
model = Kmeans(n_clusters = 2)
model.fit(X)
centers = model.cluster_centroids_
cdist(X, centers)
来自ValueError: setting an array element with a sequence.
的文档:
scipy.spatial.distance.cdist
我的Parameters: XA: ndarray
An Ma by n array of Ma original observations in an n-dimensional space
XB: ndarray
An Mb by n array of Mb original observations in an n-dimensional space
...
和X
centers
数组肯定满足numpy
的这些维度条件,对吗?我错过了什么?
答案 0 :(得分:2)
您需要做一些小改动:
cdist(X.toarray(),centers)
由于X是scipy.sparse.csr.csr_matrix
类型的对象,因此scipy函数不会直接将其作为有效输入。方法toarray()将其转换为有效的numpy数组