不平衡簇光谱聚类sklearn

时间:2018-07-24 12:31:56

标签: python scikit-learn tf-idf svd

我正在关注sklearn(http://scikit-learn.org/stable/auto_examples/bicluster/plot_bicluster_newsgroups.html#sphx-glr-auto-examples-bicluster-plot-bicluster-newsgroups-py)上的文档和单词谱的频谱聚类,并且我的群集非常不平衡。我有输出:

Vectorizing...
Coclustering...
Done in 7.20s. V-measure: 0.4267
MiniBatchKMeans...
Done in 9.13s. V-measure: 0.4414

Best biclusters:
----------------
bicluster 0 : 8 documents, 6 words
categories   : 100% talk.politics.mideast
words        : angmar, cosmo, alfalfa, alphalpha, proline, benson

bicluster 1 : 4 documents, 9 words
categories   : 100% comp.windows.x
words        : elin, eeam, ges, energeanwendung, penzingerstr, gesmbh, energieanwendung, hochreiter, wien

bicluster 2 : 14 documents, 33 words
categories   : 86% comp.windows.x, 14% talk.politics.mideast
words        : rpicas, porto, wg2, se05, libxmu, waii, xmu, picas, inescn, ep130

bicluster 3 : 2809 documents, 4242 words
categories   : 25% comp.windows.x, 21% comp.sys.ibm.pc.hardware, 20% comp.graphics
words        : windows, scsi, motif, ide, graphics, pc, card, window, bmug, controller

bicluster 4 : 5166 documents, 5686 words
categories   : 16% rec.motorcycles, 15% rec.autos, 14% sci.electronics
words        : autos, motorcycles, bike, car, sale, engine, dod, bmw, engr, honda

我不知道为什么它与本教程不同,但是是否有解决方案来使群集更加均衡?

我什至在this link之后使用“ scipy.sparse.linalg的SVD”和sklearn的“ Kmeans”对我的“自己的”光谱共聚进行编码,但是我遇到了同样的问题...

谢谢!

0 个答案:

没有答案