pyLDAvis:尝试使用BTM可视化主题时发生验证错误

时间:2019-04-16 16:31:24

标签: python

我尝试使用BTM生成主题。 在尝试可视化主题时,出现验证错误。我可以在模型训练后打印主题,但是使用pyLDAvis失败

def btm_model():
    num_topics = 10
    texts = open('./textfiles/Ori-Apr2, 2019.txt').read().splitlines()
    # vectorize texts
    vec = CountVectorizer(stop_words='english')
    X = vec.fit_transform(texts).toarray()
    # get vocabulary
    vocab = np.array(vec.get_feature_names())
    # get biterms
    biterms = vec_to_biterms(X)
    # create btm
    btm = oBTM(num_topics = num_topics, V = vocab)
    print("\n\n Train Online BTM ..")
    for i in range(0, 1): 
        biterms_chunk = biterms[i:i + 100]
        btm.fit(biterms_chunk, iterations=10)

    print("\n\n Topic coherence ..")
    res, C_z_sum = topic_summuary(btm.phi_wz.T, X, vocab, 10)

    topics = btm.transform(biterms)
    print("\n\n Visualize Topics ..")
    vis = pyLDAvis.prepare(btm.phi_wz.T, topics, np.count_nonzero(X, axis=1), vocab, np.sum(X, axis=0))
    pyLDAvis.save_html(vis, './textfiles/online_btm.html')

在pyLDAvis上面运行后,我尝试执行以下错误

Traceback (most recent call last):
  File "main_mining.py", line 293, in <module>
    btm_model(num_topics)
  File "main_mining.py", line 187, in btm_model
    vis = pyLDAvis.prepare(btm.phi_wz.T, topics, np.count_nonzero(X, axis=1), vocab, np.sum(X, axis=0))
  File "C:\Python Install Location\lib\site-packages\pyLDAvis\_prepare.py", line 375, in prepare
    _input_validate(topic_term_dists, doc_topic_dists, doc_lengths, vocab, term_frequency)
  File "C:\Python Install Location\lib\site-packages\pyLDAvis\_prepare.py", line 65, in _input_validate
    raise ValidationError('\n' + '\n'.join([' * ' + s for s in res]))
pyLDAvis._prepare.ValidationError:
 * Not all rows (distributions) in doc_topic_dists sum to 1.

1 个答案:

答案 0 :(得分:0)

在我的情况下,发生这种情况是因为我有些句子只有几个记号。我删除了少于三个标记的所有句子,它的工作原理很吸引人。