我正在学习使用Gensim
进行主题建模,但是我找不到使用HDP模型将每个文档分配给主题的方法。
from gensim import corpora
from gensim.models import HdpModel
documents = ["The Saudis are preparing a report that will acknowledge that",
"Saudi journalist Jamal Khashoggi's death was the result of an",
"interrogation that went wrong, one that was intended to lead",
"to his abduction from Turkey, according to two sources.",
"One source says the report will likely conclude that",
"the operation was carried out without clearance and",
"transparency and that those involved will be held",
"responsible. One of the sources acknowledged that the",
"report is still being prepared and cautioned that",
"things could change."]
texts = [[text for text in doc.split()] for doc in documents]
dictionary = corpora.Dictionary(texts)
corpus = [dictionary.doc2bow(text) for text in texts]
hdpmodel = HdpModel(corpus=corpus, id2word=dictionary)
hdptopics = hdpmodel.show_topics(formatted=False)
最终结果应该像这样的pandas
数据帧
doc topic
0 The Saudis are preparing a report that will ac... 1
1 Saudi journalist Jamal Khashoggi's death was t... 1
2 interrogation that went wrong, one that was in... 2
3 to his abduction from Turkey, according to two... 2
4 One source says the report will likely conclud... 3
5 the operation was carried out without clearanc... 3
6 transparency and that those involved will be held 4
7 responsible. One of the sources acknowledged t... 5
8 report is still being prepared and cautioned that 5
9 things could change. 1
任何想法:)?