Question

我想在matlab中为“majorclust”算法编写自己的代码。我有与余弦相似的文档对。当我在网上搜索时，我会遇到这个网站。

http://muse-amuse.in/~baali/MajorClustPost.html

在本网站的示例（用Python编写）中，聚类部分如下：

t = False
indices = np.arange(num_of_samples)
while not t:
  t = True
  for index in np.arange(num_of_samples):
    # aggregating edge weights 
    new_index = np.argmax(np.bincount(indices, 
    weights=cosine_distances[index]))
if indices[new_index] != indices[index]:
  indices[index] = indices[new_index]
  t = False

当我检查样本时，我有点困惑。当我们考虑for循环时：

for index in np.arange(num_of_samples):

第一个索引将为“0”。并且用“1”检索最大相似性。因此new_index必须为1，索引“0”将替换为“1”。

在下一次迭代时，索引将为“1”，其最大权重将来自“0”，其具有与前一次迭代相同的索引。因此，在此点之后循环必须终止。

该算法基于论文（见第4页）：

http://www.uni-weimar.de/medien/webis/publications/papers/stein_2002c.pdf

在论文中，声明索引必须随机选择。但在这个例子中我看不出任何随机选择。

我缺少什么？

Answer 1

是的，如果你改变指数并且你可以使用

做同样的事情会很好

from random import shuffle
shuffled_indices = np.arange(num_of_samples)
shuffle(shuffled_indices)
for index in shuffled_indices:
    # aggregating edge weights 
    new_index = np.argmax(np.bincount(indices,weights=cosine_distances[index]))
    if indices[new_index] != indices[index]:
        indices[index] = indices[new_index]
        t = False

很抱歉这么晚的回复。

对majorclust算法感到困惑

1 个答案: