Question

/Users/Barry/anaconda/lib/python2.7/site-packages/gensim/models/ldaseqmodel.py:217：RuntimeWarning：除以double_scalars中的零收敛= np.fabs（（bound-old_bound）/ old_bound）

#dynamic topic model
def run_dtm(num_topics=18):
    docs, years, titles = preprocessing(datasetType=2)

    #resort document by years
    Z = zip(years, docs)
    Z = sorted(Z, reverse=False)
    years_new, docs_new = zip(*Z)

    #generate time slice
    time_slice = Counter(years_new).values()

    for year in Counter(years_new):
        print year,' --- ',Counter(years_new)[year]

    print '********* data set loaded ********'
    dictionary = corpora.Dictionary(docs_new)
    corpus = [dictionary.doc2bow(text) for text in docs_new]

    print '********* train lda seq model ********'
    ldaseq = ldaseqmodel.LdaSeqModel(corpus=corpus, id2word=dictionary, time_slice=time_slice, num_topics=num_topics)

    print '********* lda seq model done ********'
    ldaseq.print_topics(time=1)

大家好，我正在使用gensim包中的动态主题模型进行主题分析，遵循本教程https://github.com/RaRe-Technologies/gensim/blob/develop/docs/notebooks/ldaseqmodel.ipynb，但是我总是遇到相同的意外错误。谁能给我一些指导？甚至以为我尝试了一些用于生成语料库和字典的不同数据集，我都感到非常困惑。错误是这样的：

/Users/Barry/anaconda/lib/python2.7/site-packages/gensim/models/ldaseqmodel.py:217：RuntimeWarning：除以double_scalars中的零收敛= np.fabs（（bound-old_bound）/ old_bound）

Answer 1

np.fabs错误表示NumPy遇到错误。您正在使用什么NumPy和gensim版本？

NumPy不再支持Python 2.7，Ldaseq于2016年添加到Gensim中，因此您可能没有可用的兼容版本。如果您将Python 3+教程重新编码为2.7变体，您显然会了解一些版本差异-尝试在3.6.8环境中运行（无论如何，您都将不得不升级，到2020年结束了）来自Python本身的2.7支持）。这可能已经有所帮助，我已经遍历了本教程，而自己的数据却没有遇到。

话说回来，我在运行LdaMulticore时遇到了相同的错误，它是由空的语料库引起的。

您可以尝试逐行浏览代码（或查看DEBUG级别日志）并检查输出是否具有预期的属性，而不是完全在函数中运行代码：例如，您的语料库不空（或包含空文档）？

如果发生这种情况，请修复预处理步骤，然后重试-至少对我和helped with the same ldamodel error in the mailing list有所帮助。

PS：因为我没有声誉而没有发表评论，请随时对其进行编辑。

Answer 2

这是 ldaseqmodel.py 本身的源代码存在的问题。对于最新的 gensim 软件包（版本3.8.3 ），我在第293行出现了相同的错误：

ldaseqmodel.py:293: RuntimeWarning: divide by zero encountered in double_scalars
  convergence = np.fabs((bound - old_bound) / old_bound)

现在，如果您遍历代码，将会看到以下内容： enter image description here

您可以看到，它们在 bound 和 old_bound 之间的差除以 old_bound （从警告中也可以看到）< / p>

现在，如果您进一步分析，您将在第263行看到 old_bound 初始化为 zero ，这是收到以下警告的主要原因： strong>遇到零除。

enter image description here

有关更多信息，我在第294行放置了打印声明：

print('bound = {}, old_bound = {}'.format(bound, old_bound))

我收到的输出是：enter image description here

因此，在一行中，您收到此警告是由于软件包 ldaseqmodel.py 的源代码，而不是因为有空文档。尽管如果您不从语料库中删除空文档，您将收到另一个警告。因此，我建议您的语料库中是否有空文档，请将其删除，而忽略上述被零除的警告。

gensim / models / ldaseqmodel.py：217：RuntimeWarning：在double_scalars中除以零

2 个答案: