我想写主题名称和与该主题相关的热门词汇。 BigARTM库已从0.7.6更新到v.0.8.0,因此下面的旧代码停止工作:
for topic_name in model_artm.topic_names:
print topic_name + ': ',
for word in model_artm.score_tracker["top_words"].last_topic_info[topic_name].tokens:
print word,
print
问题与第二个周期有关,没有这样的last_topic_info
,根据the official manual,我们需要artm.score_tracker.TopTokensScoreTracker
,我们应该这样写:
model_artm.score_tracker["topTokes1"].last_tokens[topic_name].value #doesn't work.
任何想法有什么不对?
答案 0 :(得分:2)
BigARTM Score Tracker API在v0.7.9和v0.8.0之间略有变化。以下示例适用于v0.8.0
import artm
batch_vectorizer = artm.BatchVectorizer(data_path=r'D:\Datasets\kos',
data_format='batches')
dictionary = artm.Dictionary(data_path=r'D:\Datasets\kos')
model = artm.ARTM(num_topics=15,
num_document_passes=5,
dictionary=dictionary,
scores=[artm.TopTokensScore(name='top_tokens_score')])
model.fit_offline(batch_vectorizer=batch_vectorizer, num_collection_passes=3)
top_tokens = model.score_tracker['top_tokens_score']
for topic_name in model.topic_names:
print '\n', topic_name
for (token, weight) in zip(top_tokens.last_tokens[topic_name],
top_tokens.last_weights[topic_name]):
print token, '-', weight
有关BigARTM Python API的其他更改,请参阅发行说明:http://docs.bigartm.org/en/stable/release_notes/python.html