我尝试使用BTM生成主题。 在尝试可视化主题时,出现验证错误。我可以在模型训练后打印主题,但是使用pyLDAvis失败
def btm_model():
num_topics = 10
texts = open('./textfiles/Ori-Apr2, 2019.txt').read().splitlines()
# vectorize texts
vec = CountVectorizer(stop_words='english')
X = vec.fit_transform(texts).toarray()
# get vocabulary
vocab = np.array(vec.get_feature_names())
# get biterms
biterms = vec_to_biterms(X)
# create btm
btm = oBTM(num_topics = num_topics, V = vocab)
print("\n\n Train Online BTM ..")
for i in range(0, 1):
biterms_chunk = biterms[i:i + 100]
btm.fit(biterms_chunk, iterations=10)
print("\n\n Topic coherence ..")
res, C_z_sum = topic_summuary(btm.phi_wz.T, X, vocab, 10)
topics = btm.transform(biterms)
print("\n\n Visualize Topics ..")
vis = pyLDAvis.prepare(btm.phi_wz.T, topics, np.count_nonzero(X, axis=1), vocab, np.sum(X, axis=0))
pyLDAvis.save_html(vis, './textfiles/online_btm.html')
在pyLDAvis上面运行后,我尝试执行以下错误
Traceback (most recent call last):
File "main_mining.py", line 293, in <module>
btm_model(num_topics)
File "main_mining.py", line 187, in btm_model
vis = pyLDAvis.prepare(btm.phi_wz.T, topics, np.count_nonzero(X, axis=1), vocab, np.sum(X, axis=0))
File "C:\Python Install Location\lib\site-packages\pyLDAvis\_prepare.py", line 375, in prepare
_input_validate(topic_term_dists, doc_topic_dists, doc_lengths, vocab, term_frequency)
File "C:\Python Install Location\lib\site-packages\pyLDAvis\_prepare.py", line 65, in _input_validate
raise ValidationError('\n' + '\n'.join([' * ' + s for s in res]))
pyLDAvis._prepare.ValidationError:
* Not all rows (distributions) in doc_topic_dists sum to 1.
答案 0 :(得分:0)
在我的情况下,发生这种情况是因为我有些句子只有几个记号。我删除了少于三个标记的所有句子,它的工作原理很吸引人。