Question

我通过Tesseract 3.01的配置文件设置language_model_penalty_non_dict_word，但它的值没有任何影响。我尝试过多个图像，并且有多个值，但每个图像的输出总是相同的。另一位用户注意到了相同的in a comment in another question。

修改：查看来源内部后，变量language_model_penalty_non_dict_word仅在函数float LanguageModel::ComputeAdjustedPathCost内使用。

然而，这个功能永远不会被调用！它仅由2个函数引用 - LanguageModel::UpdateBestChoice()和LanguageModel::AddViterbiStateEntry()。我在这些函数中放置了断点，但它们也没有被调用。

Answer 1

经过一些调试后，我终于找到了原因 - 函数Wordrec::SegSearch()没有被调用（它在LanguageModel::ComputeAdjustedPathCost()的调用图中就在那里。）

从这段代码：

  if (enable_new_segsearch) {
    SegSearch(&chunks_record, word->best_choice,
              best_char_choices, word->raw_choice, state);
  } else {
    best_first_search(&chunks_record, best_char_choices, word,
                      state, fixpt, best_state);
  }

所以你需要在配置文件中设置enable_new_segsearch：

enable_new_segsearch    1

language_model_penalty_non_freq_dict_word 0.2
language_model_penalty_non_dict_word 0.3

“language_model_penalty_non_dict_word”对tesseract 3.01没有影响

1 个答案: