将文本文档列表转换为语料库词典,然后使用以下方法将其转换为词袋模型:
dictionary = gensim.corpora.Dictionary(docs) # docs is a list of text documents
corpus = [dictionary.doc2bow(doc) for doc in docs]
我们可以使用以下方法找出字典中特定单词的索引值:
dictionary.doc2idx(["righteous","height"])
有没有办法在特定索引处找到存储在字典中的单词?
答案 0 :(得分:2)
<强> TL; DR:强>
dictionary.get(index_of_word)
示例:
import gensim
docs=[['hello', 'world'],['i','am', 'groot']]
dictionary = gensim.corpora.Dictionary(docs) # docs is a list of text documents
corpus = [dictionary.doc2bow(doc) for doc in docs]
print(dictionary.get(0))
print(dictionary.get(3))
输出:
hello
groot
希望有所帮助!