我正在尝试将简短的文档聚类,例如以下
sentences<-c("The color blue neutralizes orange yellow reflections.",
"Zod stabbed me with blue Kryptonite.",
"Because blue is your favourite colour.",
"Red is wrong, blue is right.",
"You and I are going to yellowstone.",
"Van Gogh looked for some yellow at sunset.",
"You ruined my beautiful green dress.",
"You do not agree.",
"There's nothing wrong with green.")
在代码的初始化步骤中,我应该根据Dirichlet多项式分布将文档随机分配到K
个群集中。
如何执行此任务?
编辑由于@ ags29的评论,我在Sampling from Dirichlet-Multinomial
中找到了D=9 # number of documents in the corpus; I have 9 sentences in my example
k=2 # number of clusters (e.g. 2)
alpha=runif(D) # value of alpha, here chosen at random
p=rgamma(D,alpha) # pre-simulation of the Dirichlet
x=rmultinom(1,k,p)
你怎么看?