我在使用 pytesseract 的 conda 环境中安装了 OCRmyPDF 包。当我运行命令“ocrmypdf --help”时,我收到以下错误:
[WinError 2] The system cannot find the file specified
Traceback (most recent call last):
File "c:\users\{user}\anaconda3\envs\tesseract\lib\runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "c:\users\{user}\anaconda3\envs\tesseract\lib\runpy.py", line 87, in _run_code
exec(code, run_globals)
File "C:\Users\{user}\Anaconda3\envs\tesseract\Scripts\ocrmypdf.exe\__main__.py", line 4, in <module>
File "c:\users\{user}\anaconda3\envs\tesseract\lib\site-packages\ocrmypdf\__init__.py", line 10, in <module>
from ocrmypdf import helpers, hocrtransform, leptonica, pdfa, pdfinfo
File "c:\users\{user}\anaconda3\envs\tesseract\lib\site-packages\ocrmypdf\leptonica.py", line 44, in <module>
raise MissingDependencyError(
ocrmypdf.exceptions.MissingDependencyError:
---------------------------------------------------------------------
This error normally occurs when ocrmypdf can't find the Leptonica
library, which is usually installed with Tesseract OCR. It could be that
Tesseract is not installed properly, we can't find the installation
on your system PATH environment variable.
The library we are looking for is usually called:
liblept-5.dll (Windows)
liblept*.dylib (macOS)
liblept*.so (Linux/BSD)
Please review our installation procedures to find a solution:
https://ocrmypdf.readthedocs.io/en/latest/installation.html
---------------------------------------------------------------------
在被问到之前,是的,我确实安装了 tesseract,因为我已经成功地使用了 pytesseract。我怀疑这个问题是由于我使用 conda 安装 Tesseract,它安装在我的环境中,而不是从源代码下载并直接在 Windows 中编译。在 pytesseract 中,我可以通过放置将 Tesseract 可执行文件的位置设置为 pytesseract 用来调用“tesseract”的变量
pytesseract.pytesseract.tesseract_cmd = r'C:\Users\{user}\Anaconda3\envs\tesseract\Library\bin\tesseract.exe'
在脚本中。我直接搜索了 OCRmyPDF 文档和源代码,看看我是否能找到一个变量或命令行参数,我可以类似地向其分配位置,但没有任何成功。是否有类似的解决方法,或者我是否必须直接在 Windows 中编译 Tesseract 才能使 OCRmyPDF 能够运行?
另外,我看到 this 线程说我可以将 conda 环境添加到我系统的 PATH,但我不确定这是否会允许 OCRmyPDF 访问 Tesseract 和 Leptonica 包并解决问题,或者如果这会引发其他问题,或者说实话,从编程的角度来看,我对 Windows 的了解极其有限。