安装tesseract-ocr时出错

时间:2018-06-17 12:14:09

标签: python windows-8.1 ocr command-prompt tesseract

我想将pytesseract用于ocr。所以安装它。但在此之前,我需要安装tesseract-ocr。我正在使用Windows 8.1。我打开命令行并运行命令pip install tesseract-ocr。以下行是该命令的结果。

我无法理解这里发生的事情。我如何理解这一点并帮助我在我的电脑上成功安装tesseract?

C:\Users\HarshLaptop>pip install tesseract-ocr
Collecting tesseract-ocr
  Using cached https://files.pythonhosted.org/packages/e2/0d/dcee3dd0fc4c7bcd181
25a98f8ba6d9db7aecaa40770595203e312649587/tesseract-ocr-0.0.1.tar.gz
Requirement already satisfied: cython in c:\users\harshlaptop\anaconda3\lib\site
-packages (from tesseract-ocr) (0.25.2)
Building wheels for collected packages: tesseract-ocr
  Running setup.py bdist_wheel for tesseract-ocr ... error
  Complete output from command c:\users\harshlaptop\anaconda3\python.exe -u -c "
import setuptools, tokenize;__file__='C:\\Users\\HARSHL~1\\AppData\\Local\\Temp\
\pip-install-x8nz3uhm\\tesseract-ocr\\setup.py';f=getattr(tokenize, 'open', open
)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __f
ile__, 'exec'))" bdist_wheel -d C:\Users\HARSHL~1\AppData\Local\Temp\pip-wheel-s
j29zfyo --python-tag cp36:
  running bdist_wheel
  running build
  running build_py
  file tesseract_ocr.py (for module tesseract_ocr) not found
  file tesseract_ocr.py (for module tesseract_ocr) not found
  running build_ext
  building 'tesseract_ocr' extension
  creating build
  creating build\temp.win-amd64-3.6
  creating build\temp.win-amd64-3.6\Release
  C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\BIN\x86_amd64\cl.exe /c
 /nologo /Ox /W3 /GL /DNDEBUG /MD -Ic:\users\harshlaptop\anaconda3\include -Ic:\
users\harshlaptop\anaconda3\include "-IC:\Program Files (x86)\Microsoft Visual S
tudio 14.0\VC\INCLUDE" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.10
240.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\8.1\include\shared" "-IC:\Pro
gram Files (x86)\Windows Kits\8.1\include\um" "-IC:\Program Files (x86)\Windows
Kits\8.1\include\winrt" /EHsc /Tptesseract_ocr.cpp /Fobuild\temp.win-amd64-3.6\R
elease\tesseract_ocr.obj
  tesseract_ocr.cpp
  tesseract_ocr.cpp(463): fatal error C1083: Cannot open include file: 'leptonic
a/allheaders.h': No such file or directory
  error: command 'C:\\Program Files (x86)\\Microsoft Visual Studio 14.0\\VC\\BIN
\\x86_amd64\\cl.exe' failed with exit status 2

  ----------------------------------------
  Failed building wheel for tesseract-ocr
  Running setup.py clean for tesseract-ocr
Failed to build tesseract-ocr
Installing collected packages: tesseract-ocr
  Running setup.py install for tesseract-ocr ... error
    Complete output from command c:\users\harshlaptop\anaconda3\python.exe -u -c
 "import setuptools, tokenize;__file__='C:\\Users\\HARSHL~1\\AppData\\Local\\Tem
p\\pip-install-x8nz3uhm\\tesseract-ocr\\setup.py';f=getattr(tokenize, 'open', op
en)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, _
_file__, 'exec'))" install --record C:\Users\HARSHL~1\AppData\Local\Temp\pip-rec
ord-vnlr99lk\install-record.txt --single-version-externally-managed --compile:
    running install
    running build
    running build_py
    file tesseract_ocr.py (for module tesseract_ocr) not found
    file tesseract_ocr.py (for module tesseract_ocr) not found
    running build_ext
    building 'tesseract_ocr' extension
    creating build
    creating build\temp.win-amd64-3.6
    creating build\temp.win-amd64-3.6\Release
    C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\BIN\x86_amd64\cl.exe
/c /nologo /Ox /W3 /GL /DNDEBUG /MD -Ic:\users\harshlaptop\anaconda3\include -Ic
:\users\harshlaptop\anaconda3\include "-IC:\Program Files (x86)\Microsoft Visual
 Studio 14.0\VC\INCLUDE" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.
10240.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\8.1\include\shared" "-IC:\P
rogram Files (x86)\Windows Kits\8.1\include\um" "-IC:\Program Files (x86)\Window
s Kits\8.1\include\winrt" /EHsc /Tptesseract_ocr.cpp /Fobuild\temp.win-amd64-3.6
\Release\tesseract_ocr.obj
    tesseract_ocr.cpp
    tesseract_ocr.cpp(463): fatal error C1083: Cannot open include file: 'lepton
ica/allheaders.h': No such file or directory
    error: command 'C:\\Program Files (x86)\\Microsoft Visual Studio 14.0\\VC\\B
IN\\x86_amd64\\cl.exe' failed with exit status 2

    ----------------------------------------
Command "c:\users\harshlaptop\anaconda3\python.exe -u -c "import setuptools, tok
enize;__file__='C:\\Users\\HARSHL~1\\AppData\\Local\\Temp\\pip-install-x8nz3uhm\
\tesseract-ocr\\setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.rea
d().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" insta
ll --record C:\Users\HARSHL~1\AppData\Local\Temp\pip-record-vnlr99lk\install-rec
ord.txt --single-version-externally-managed --compile" failed with error code 1
in C:\Users\HARSHL~1\AppData\Local\Temp\pip-install-x8nz3uhm\tesseract-ocr\`enter code here`

3 个答案:

答案 0 :(得分:2)

我有同样的问题。在Windows 10计算机和python 3.6上使用Visual Studio 2017安装。对我有用的是:

  1. https://github.com/UB-Mannheim/tesseract/wiki下载并安装tesseract-ocr可执行文件(脚本假定 从Windows系统运行并将tesseract安装保存到 建议的默认位置,即C:\ Program档案 (x86)\ Tesseract-OCR)参见 https://github.com/tesseract-ocr/tesseract/wiki了解更多信息 在不同的操作系统类型(包括Windows)上安装时,请使用 预构建的二进制包。
  2. 确保已安装用于打开图像的Python Imaging Library('PIL')或'pillow'软件包。 (在我的计算机上安装PIL无效 设置,但枕头没有安装,即点子安装枕头)。您需要的理由 这是因为pytesseract需要它。看到 https://pypi.org/project/pytesseract/0.2.5/以获得更多信息。
  3. 然后在代码中成功使用它,只需在代码内设置tesseract_cmd路径,如下所示:

    from PIL import Image
    import pytesseract
    
    try:
    img = Image.open(path/to/image.png) 
    pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files (x86)\Tesseract-OCR\tesseract'
    text = pytesseract.image_to_string(path/to/image.png)
    Print(text)
    

    希望有帮助。

答案 1 :(得分:0)

你需要安装leptonica.Tesseract需要它。

答案 2 :(得分:0)

要安装leptonica,您需要遵循以下link

conda install -c conda-forge leptonica

但是,要消除安装tesseract-ocr时的错误,这根本不是一个完整的解决方案。

您需要使用here可用的Windows Installer安装tesseract。然后,您应将python包装器安装为:

pip install pytesseract

最后但并非最不重要的一点是,在导入pytesseract库后,还应在脚本中设置tesseract路径,如下所示(请注意,安装路径可能会因您的情况而被修改!):

pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files (x86)\Tesseract-OCR\tesseract.exe'