使用sklearn.cluster.KMeans。几乎这个确切的代码工作得更早,我改变的是我构建数据集的方式。我甚至不知道从哪里开始......以下是代码:
from sklearn.cluster import KMeans
km = KMeans(n_clusters=20)
for item in dfX:
if type(item) != type(dfX[0]):
print(item)
print(len(dfX))
print(dfX[:10])
km.fit(dfX)
print(km.cluster_centers_)
其中输出以下内容:
12147
[1.201, 1.237, 1.092, 1.074, 0.979, 0.885, 1.018, 1.083, 1.067, 1.071]
/home/sbendl/anaconda3/lib/python3.5/site-packages/sklearn/utils/validation.py:386: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and willraise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.
DeprecationWarning)
Traceback (most recent call last):
File "/home/sbendl/PycharmProjects/MLFP/K-means.py", line 20, in <module>
km.fit(dfX)
File "/home/sbendl/anaconda3/lib/python3.5/site-packages/sklearn/cluster/k_means_.py", line 812, in fit
X = self._check_fit_data(X)
File "/home/sbendl/anaconda3/lib/python3.5/site-packages/sklearn/cluster/k_means_.py", line 789, in _check_fit_data
X.shape[0], self.n_clusters))
ValueError: n_samples=1 should be >= n_clusters=20
Process finished with exit code 1
从输出中可以看出,肯定有12147个样本,在大多数计数系统中大于20;)。此外,它们都是浮动的,所以它不会有问题。有人有什么想法吗?