Question

我正在运行Tesseract 4.0.0并且我尝试了以下命令以创建可搜索的pdf，但它似乎不起作用：

Task("Build")
    .IsDependentOn("RestoreNuGetPackages")
    .IsDependentOn("SetVersion")
    .Does(() =>
{
   Information("Running DotNetCoreBuild");
    DotNetCoreBuild("../MySolution.sln", new DotNetCoreBuildSettings { 
        Configuration = configuration
   });
});

它出错了：

tesseract input output pdf

pdf文件已创建，但无法打开。我尝试了不同的图像格式：jpg，tif，png没有成功。

Answer 1

它确实可以工作，不确定您使用的是哪个操作系统，但是我意识到要使其在Linux上运行，必须进行完整安装

sudo apt install tesseract-ocr
sudo apt install tesseract-ocr-all

然后，例如对于一个德语文档，最初是一个多页tif：

tesseract multipage-tiff.tif out pdf -l deu

该手册很有用-https://github.com/tesseract-ocr/tesseract/wiki

Tesseract可搜索的pdf创建不起作用

1 个答案: