Question

我正在尝试将简短的文档聚类，例如以下

sentences<-c("The color blue neutralizes orange yellow reflections.", 
             "Zod stabbed me with blue Kryptonite.", 
             "Because blue is your favourite colour.",
             "Red is wrong, blue is right.",
             "You and I are going to yellowstone.",
             "Van Gogh looked for some yellow at sunset.",
             "You ruined my beautiful green dress.",
             "You do not agree.",
             "There's nothing wrong with green.")

在代码的初始化步骤中，我应该根据Dirichlet多项式分布将文档随机分配到K个群集中。

如何执行此任务？

编辑由于@ ags29的评论，我在Sampling from Dirichlet-Multinomial

中找到了

D=9  # number of documents in the corpus; I have 9 sentences in my example
k=2 # number of clusters (e.g. 2)
alpha=runif(D) # value of alpha, here chosen at random
p=rgamma(D,alpha) # pre-simulation of the Dirichlet
x=rmultinom(1,k,p)

你怎么看？

根据Dirichlet多项式分布

0 个答案: