K均值聚类,小于聚类数的样本数

时间:2017-08-01 14:55:17

标签: python-2.7 k-means

我对K-means很新,所以我希望有人可以帮我解决以下问题。

mbk = MiniBatchKMeans(n_clusters=3, init_size=400, batch_size=300, verbose=1).fit(model_dm.docvecs[20000])

但是我收到了这个错误:

 /usr/local/lib64/python2.7/site-packages/sklearn/utils/validation.py:386: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and willraise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.
  DeprecationWarning)

ValueErrorTraceback (most recent call last)
<ipython-input-6-43cf0431aa1e> in <module>()
      6 
      7 
----> 8 mbk = MiniBatchKMeans(n_clusters=3, init_size=400, batch_size=300, verbose=1).fit(model_dm.docvecs[20000])
      9 
     10 

/usr/local/lib64/python2.7/site-packages/sklearn/cluster/k_means_.pyc in fit(self, X, y)
   1236         n_samples, n_features = X.shape
   1237         if n_samples < self.n_clusters:
-> 1238             raise ValueError("Number of samples smaller than number "
   1239                              "of clusters.")
   1240 

ValueError: Number of samples smaller than number of clusters.

1 个答案:

答案 0 :(得分:0)

您是否尝试过使用#include <iostream> #include <string> #include <memory> #include <tuple> typedef std::tuple<std::unique_ptr<unsigned char[]>, int, int> ImageType; class DataModel { public: void setImage(ImageType image) { myImage = image; } ImageType myImage; }; int main() { std::unique_ptr<unsigned char[]> ptr(new unsigned char[100]); ImageType im = std::make_tuple(std::move(ptr), 10, 10); DataModel dm; dm.setImage(im); return 0; } ,因为如果这是一个列表,那么您只调用它是20001st元素:

model_dm.docvecs[:20000]