Question

我正在使用scikit-learn中的LatentDirichletAllocation() class进行试验，evaluate_every参数的说明如下。

多久评估一次困惑。仅用于fit方法。将其设置为0 或负数不能在训练中评估困惑。评估困惑可以帮助您检查培训的收敛性过程，但它也会增加总的训练时间。评估每次迭代中的困惑可能会增加训练时间两方面。

我将此参数设置为2（默认值为0）并且增加了训练时间，但我似乎无法在任何地方找到困惑值。这些结果是保存的，还是仅由模型用于确定何时停止？我希望使用困惑值来衡量我的模型的进度和学习曲线。

Answer 1

它与perp_tol参数一起用于评估收敛，并且不会根据source在迭代之间保存：

for i in xrange(max_iter):

    # ...

    # check perplexity
    if evaluate_every > 0 and (i + 1) % evaluate_every == 0:
        doc_topics_distr, _ = self._e_step(X, cal_sstats=False,
                                            random_init=False,
                                            parallel=parallel)
        bound = self.perplexity(X, doc_topics_distr,
                                sub_sampling=False)
        if self.verbose:
            print('iteration: %d, perplexity: %.4f'
                    % (i + 1, bound))

        if last_bound and abs(last_bound - bound) < self.perp_tol:
            break
        last_bound = bound
    self.n_iter_ += 1

请注意，您可以通过以下方式轻松调整现有来源：（1）将行self.saved_bounds = []添加到__init__方法（2），将self.bounds.append(bound)添加到上面，例如这样：

if last_bound and abs(last_bound - bound) < self.perp_tol:
    break
last_bound = bound
self.bounds.append(bound)

根据您保存更新课程的位置，您还必须调整文件顶部的导入以引用scikit-learn中的完整模块路径。

使用scikit的LatentDirichletAllocation类训练时评估模型

1 个答案: