我正在使用Gensim计算2个文档之间的相似度。由于某种原因,行tfidf [corpus]返回一个空列表。虽然
,但我不确定 articles = []
#make a corpus by adding each of the top 25 documents to a list
for x in range(0,25):
articles.append(str(WikiDoc(sorted_links[0]).jsonify()['text']))
#puts all of the top 25 documents into a list
texts = [[word for word in document.lower().split()] for document in articles]
print texts
#load precomputed dictionary
articles_dict = corpora.Dictionary(texts)
articles_dict.save('./articles.dict')
articles_dict = Dictionary.load('./articles.dict')
#articles_corpus = [articles_dict.doc2bow(text) for text in texts]
#corpora.MmCorpus.serialize('./articles.mm', articles_corpus)
corpus = [articles_dict.doc2bow(text) for text in texts]
corpora.MmCorpus.serialize('./articles.mm', corpus)
corpus = corpora.MmCorpus('./articles.mm')
#build the tfidf model based on the 25 documents so that we can find similarities
#with respect to each of these documents
tfidf = models.TfidfModel(corpus)
#get the other document and process to produce dictionary representation
one_doc_bow = WikiDoc('SpongeBob')
one_doc_bow = articles_dict.doc2bow(one_doc_bow.jsonify()['text'].lower().split())
print tfidf[one_doc_bow]
top = tfidf[one_doc_bow]
corpus_tfidf = tfidf[corpus]
当我打印字典时,我得到:字典(2204个唯一标记) 当我打印MmCorpus时,我得到:MmCorpus(25个文档,2204个特征,55100个非零条目) tfidf [corpus] yield []。 任何人都可以诊断我的问题吗?非常感谢!