
时间:2014-11-17 16:19:22

标签: python lda gensim

我正在开展一个项目,我想使用Latent Dirichlet Allocation来从大量文章中提取主题。


import gensim
import csv
import json
import glob
from gensim import corpora, models
from nltk.corpus import stopwords
from nltk.tokenize import RegexpTokenizer
from time import gmtime, strftime

tokenizer = RegexpTokenizer(r'\w+')
cachedStopWords = set(stopwords.words("english"))
body = []
processed = []

with open('/…/file.json') as j:
    data = json.load(j)

for i in range(0,len(data)):

for entry in body:
    row = tokenizer.tokenize(entry)
    processed.append([word for word in row if word not in cachedStopWords])

dictionary = corpora.Dictionary(processed)
corpus = [dictionary.doc2bow(text) for text in processed]
lda = gensim.models.ldamodel.LdaModel(corpus, id2word=dictionary, num_topics=50, update_every=1, passes=1)
topics = lda.show_topics(num_topics=50, num_words=8)

other_doc = "After being jailed for life in 1964, Nelson Mandela became a worldwide symbol of resistance to apartheid. But his opposition to racism began many years before."
print lda[other_doc]

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-  packages/gensim/models/ldamodel.py", line 714, in __getitem__
gamma, _ = self.inference([bow])
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site
packages/gensim/models/ldamodel.py", line 361, in inference ids = [id for id, _ in doc]
ValueError: need more than 1 value to unpack


lda = gensim.models.LdaMulticore(corpus, id2word=dictionary, num_topics=100, workers=3)
lda = gensim.models.ldamodel.LdaMulticore(corpus, id2word=dictionary, num_topics=100, workers=3)
lda = models.LdaMulticore(corpus, id2word=dictionary, num_topics=100, workers=3)


Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'module' object has no attribute ‘LdaMulticore'



2 个答案:

答案 0 :(得分:3)



vec_bow = dictionary.doc2bow(other_doc.lower().split())
vec_lsi = lda[vec_bow] # convert the query to LSI space

答案 1 :(得分:0)

我意识到这已经过时了,但我遇到了同样的问题。您可能指向较旧版本的Gensim。您必须确保使用版本&gt; = 0.10.2。

使用“easy_install -U gensim”更新,然后确保您的IDE看到更新的库。