LDA结果错误

时间:2015-06-15 17:42:11

标签: machine-learning nlp lda topic-modeling gensim

所以,我总体上使用Gensim和LDA相对较新。现在的问题是,当我在我的语料库上运行LDA时,主题是'令牌'权重都是0:

2015-06-15 12:21:12,439:INFO:主题diff = 0.082235,rho = 0.250000

2015-06-15 12:21:12,454:信息:主题#0 (0.100):0.000 * sundayes + 0.000 * nowe + 0.000 *复活节+ 0.000 * iniunctions + 0.000 * eyther + 0.000 * christ,+ 0.000 * authoritie + 0.000 * sir + 0.000 * saint + 0.000 * thinge

2015-06-15 12:21:12,468:信息:主题#1 (0.100):0.000 * eu&n; n + 0.000 * ioseph + 0.000 * pharohs + 0.000 * pharoh + 0.000 * iosephs + 0.000 * lo! + 0.000 * egypts + 0.000 * iacob + 0.000 * ioseph,+ 0.000 * beniamin

2015-06-15 12:21:12,482:信息:话题#2 (0.100):0.000 *友好+ 0.000 *蔓延,+ 0.000 *四分之二+ 0.000 *有争议+ 0.000 *背景,+ 0.000 * vicars,+ 0.000 * sacrament + 0.000 *相反+ 0.000 * parsons,+ 0.000 * propitiatorie

2015-06-15 12:21:12,495:信息:主题#3 (0.100):0.000 * yf + 0.000 * suche + 0.000 * lyke + 0.000 * shoulde + 0.000 * moste + 0.000 * youre + 0.000 * oure + 0.000 * lyfe,+ 0.000 * anye + 0.000 * thinges

2015-06-15 12:21:12,507:信息:主题#4 (0.100):0.000 * heau' nly + 0.000 * eu' n + 0.000 * heau&# 39; n + 0.000 * sweet + 0.000 * peace + 0.000 * eu&ry + 0.000 * constance + 0.000 * constant + 0.000 * doth + 0.000 * oh

2015-06-15 12:21:12,521:信息:主题#5 (0.100):0.000 * eu&n; n + 0.000 * ioseph + 0.000 * pharohs + 0.000 * pharoh + 0.000 * vel + 0.000 * iosephs + 0.000 * heau' n + 0.000 * lo! + 0.000 * ac + 0.000 * seu' n

2015-06-15 12:21:12,534:信息:主题#6 (0.100):0.000 *你+ 0.000 *会+ 0.000 *爱情+ 0.000 *国王+ 0.000 *先生, + 0.000 * doe + 0.000 * thee + 0.000 * 1。 + 0.000 *从不+ 0.000 * 2.

2015-06-15 12:21:12,546:信息:主题#7 (0.100):0.000 * quae + 0.000 * vt + 0.000 * qui + 0.000 * ij + 0.000 * non + 0.000 * ad + 0.000 * si + 0.000 * vel + 0.000 * atque + 0.000 * cum

2015-06-15 12:21:12,558:信息:主题#8 (0.100):0.000 *怀疑+ 0.000 *超敏+ 0.000 *乡绅+ 0.000 *牧师+ 0.000 *普通人+ 0.000 * vsed,+ 0.000 *英语,+ 0.000 *两周+ 0.000 *乡绅,+ 0.000 *罪犯

2015-06-15 12:21:12,572:信息:主题#9 (0.100):0.001 * / + 0.001 * ile + 0.000 * y ^ e + 0.000 * che + 0.000 *多+ 0.000 * tis + 0.000 *可以+ 0.000 * oh + 0.000 * neuer + 0.000 *心

我有307个文档,并且在删除停用词后我使用以下代码运行我的LDA:

texts = [[如果频率为[令牌]的文本中的令牌令牌> 3]用于文本中的文字]

dictionary = corpora.Dictionary(文本)

corpus = [dictionary.doc2bow(text)for text in text]

tfidf = models.TfidfModel(corpus) tfidf_corpus = tfidf [corpus]

lda = models.LdaModel(tfidf_corpus,id2word = dictionary,update_every = 1,chunksize = 20,num_topics = 10,pass = 1)

LDA [tfidf_corpus]

lda.print_topics(10)

我不确定出了什么问题,但每次运行时,令牌权重为0.可能导致此问题的原因是什么?如何更正?

0 个答案:

没有答案