使用Gensim包进行LDA主题建模时的IndexError

时间:2014-01-23 16:09:49

标签: python lda topic-modeling gensim

我共有54892个文件,这些文件有360331个独特的令牌。字典的长度是88。

mm = corpora.MmCorpus('PRC.mm')
dictionary = corpora.Dictionary('PRC.dict')
lda = gensim.models.ldamodel.LdaModel(corpus=mm, id2word=dictionary, num_topics=50, update_every=0, chunksize=19188, passes=650)

每当我运行此脚本时,我都会收到此错误:

Traceback (most recent call last):
File "C:\Users\modelDeTopics.py", line 19, in <module>
lda = gensim.models.ldamodel.LdaModel(corpus=mm, id2word=dictionary, num_topics=50, update_every=0, chunksize=19188, passes=650)
File "C:\Python27\lib\site-packages\gensim-0.8.6-py2.7.egg\gensim\models\ldamodel.py", line 265, in __init__
self.update(corpus)
File "C:\Python27\lib\site-packages\gensim-0.8.6-py2.7.egg\gensim\models\ldamodel.py", line 445, in update
self.do_estep(chunk, other)
File "C:\Python27\lib\site-packages\gensim-0.8.6-py2.7.egg\gensim\models\ldamodel.py", line 365, in do_estep
gamma, sstats = self.inference(chunk, collect_sstats=True)
File "C:\Python27\lib\site-packages\gensim-0.8.6-py2.7.egg\gensim\models\ldamodel.py", line 318, in inference
expElogbetad = self.expElogbeta[:, ids]
IndexError: index 8 is out of bounds for axis 1 with size 8

我在网上查看,提到我可能与计算机的RAM有关。我使用32位Windows 4和4 GB RAM。我应该在剧本中做出哪些改变?

请帮忙!

1 个答案:

答案 0 :(得分:0)

您的dictionary看起来有问题。 88个独特的单词听起来不合理。

发布完整日志会显示更多内容。