我作为gensim教程在195145文档,6636308功能,188901082非零条目上运行LAD。 代码很简单:
from gensim import corpora, models, similarities
class MyCorpus(object):
def __iter__(self):
for line in open('/home/pda/xxz149/LDA/DrugPatents.csv'):
# assume there's one document per line, tokens separated by ','
yield dictionary.doc2bow(line.lower().split(','))
dictionary = corpora.Dictionary.load('/home/pda/xxz149/LDA/DrugPatent.dict')
corpus = MyCorpus()
lda = models.ldamodel.LdaModel(corpus, num_topics = 300,id2word=dictionary,distributed = False,chunksize = 15, passes = 1 )
lda.save('/home/pda/xxz149/LDA/lda_DrugPatent.model')
但我遇到了价值错误:
File "/usr/lib/python2.6/site-packages/gensim-0.10.0rc1-py2.6.egg/gensim/models/ldamodel.py", line 79, in __init__
self.sstats = numpy.zeros(shape)
ValueError: array is too big.
gensim对内存友好,为什么会这样?我怎么能通过?