如何在BigARTM中迭代顶部单词?

时间:2016-05-14 13:11:45

标签: python python-2.7

我想写主题名称和与该主题相关的热门词汇。 BigARTM库已从0.7.6更新到v.0.8.0,因此下面的旧代码停止工作:

for topic_name in model_artm.topic_names:
    print topic_name + ': ',
    for word in model_artm.score_tracker["top_words"].last_topic_info[topic_name].tokens:
        print word,
    print

问题与第二个周期有关,没有这样的last_topic_info,根据the official manual,我们需要artm.score_tracker.TopTokensScoreTracker,我们应该这样写:

model_artm.score_tracker["topTokes1"].last_tokens[topic_name].value #doesn't work.

任何想法有什么不对?

1 个答案:

答案 0 :(得分:2)

BigARTM Score Tracker API在v0.7.9和v0.8.0之间略有变化。以下示例适用于v0.8.0

import artm
batch_vectorizer = artm.BatchVectorizer(data_path=r'D:\Datasets\kos',
                                        data_format='batches')
dictionary = artm.Dictionary(data_path=r'D:\Datasets\kos')
model = artm.ARTM(num_topics=15,
                  num_document_passes=5,
                  dictionary=dictionary,
                  scores=[artm.TopTokensScore(name='top_tokens_score')])

model.fit_offline(batch_vectorizer=batch_vectorizer, num_collection_passes=3)

top_tokens = model.score_tracker['top_tokens_score']
for topic_name in model.topic_names:
    print '\n', topic_name
    for (token, weight) in zip(top_tokens.last_tokens[topic_name],
                               top_tokens.last_weights[topic_name]):
        print token, '-', weight

有关BigARTM Python API的其他更改,请参阅发行说明:http://docs.bigartm.org/en/stable/release_notes/python.html