Question

我正在尝试使用Tesseract对几张桌子进行OCR。这些表格具有以下格式：

Item One name                       Item One category
(Item description if any)

Item Two name                       Item Two category
(Item description if any)

名称和类别之间有一些空格。产生的输出就像这样

Item One name
(Item description if any)

Item Two name
(Item description if any)


Item One category

Item Two category

有没有办法让我可以为整行生成输出，而不是让这个列明智的输出一个在另一个之下？

我通过简单的命令行运行Tesseract：

tesseract ~/Desktop/imagename.jpg out

Answer 1

尝试使用其他页面分割模式（PSM），例如4或6。

Tesseract - 整行输出

1 个答案: