Question

我正在尝试使用k根据轮廓分数找到正确数量的群集sklearn.cluster.MiniBatchKMeans。

from sklearn.cluster import MiniBatchKMeans
from sklearn.feature_extraction.text import HashingVectorizer

docs = ['hello monkey goodbye thank you', 'goodbye thank you hello', 'i am going home goodbye thanks', 'thank you very much sir', 'good golly i am going home finally']

vectorizer = HashingVectorizer()

X = vectorizer.fit_transform(docs)

for k in range(5):
    model = MiniBatchKMeans(n_clusters = k)
    model.fit(X)

我收到此错误：

Warning (from warnings module):
  File "C:\Python34\lib\site-packages\sklearn\cluster\k_means_.py", line 1279
    0, n_samples - 1, init_size)
DeprecationWarning: This function is deprecated. Please call randint(0, 4 + 1) instead
Traceback (most recent call last):
  File "<pyshell#85>", line 3, in <module>
    model.fit(X)
  File "C:\Python34\lib\site-packages\sklearn\cluster\k_means_.py", line 1300, in fit
    init_size=init_size)
  File "C:\Python34\lib\site-packages\sklearn\cluster\k_means_.py", line 640, in _init_centroids
    x_squared_norms=x_squared_norms)
  File "C:\Python34\lib\site-packages\sklearn\cluster\k_means_.py", line 88, in _k_init
    n_local_trials = 2 + int(np.log(n_clusters))
OverflowError: cannot convert float infinity to integer

我知道type(k)是int，所以我不知道这个问题的来源。我可以运行以下内容，但我似乎无法遍历列表中的整数，即使type(2)等于k = 2; type(k)

model = MiniBatchKMeans(n_clusters = 2)
model.fit(X)

即使运行不同的model作品：

>>> model = KMeans(n_clusters = 2)
>>> model.fit(X)
KMeans(copy_x=True, init='k-means++', max_iter=300, n_clusters=2, n_init=10,
    n_jobs=1, precompute_distances='auto', random_state=None, tol=0.0001,
    verbose=0)

Answer 1

让我们分析您的代码：

for k in range(5)返回以下序列：
- 0, 1, 2, 3, 4
model = MiniBatchKMeans(n_clusters = k)使用n_clusters=k
让我们看一下第一次迭代：
- 使用n_clusters=0
- 在优化代码中（查看输出）：
- int(np.log(n_clusters))
- = int(np.log(0))
- = int(-inf)
- 错误：没有整数的无限定义！
- - ＆GT;将-inf的浮点值转换为int是不可能的！

设置n_clusters=0没有意义！

MiniBatchKMeans OverflowError：无法将float无穷大转换为整数？

1 个答案: