Pytesseract:Windows错误[错误2]“系统无法找到指定的文件”调用tesseract OCR

时间:2016-12-13 16:23:09

标签: python tesseract python-tesseract

我试图通过基于Python 2.7的Anaconda进行tesseract OCR工作。 在对此过程进行了各种更改后,这是最终编写的代码。

> import os 

> from PIL import * 

> from PIL import Image 

> from tesseract import *                #different : quantum simulations
> 
> import pytesseract
> 
> print os.getcwd() 

> im = Image.open('D:\File_conv\phototest.tif') #to be sure of path

> im.load() 

> print im
> text = pytesseract.image_to_string(im)       #Generates error
> import pytesseract
> print(pytesseract.image_to_string(Image.open(
> 'D:/File_conv/phototest.tif')))                #
> print(pytesseract.image_to_string(Image.open('test-european.jpg'),
> lang='fra'))                                  #Same error

对image_to_string的调用会生成Windows错误[错误2]:

> > text = pytesseract.image_to_string(im)
> >Traceback (most recent call last):
> 
>   File "<ipython-input-92-1f75dd6f29f3>", line 1, in <module>
>     text = pytesseract.image_to_string(im)
> 
>   File "C:\Program Files
> (x86)\Anaconda2\lib\site-packages\pytesseract\pytesseract.py", line
> 161, in image_to_string
>     boxes=boxes,
> 
>   File "C:\Program Files
> (x86)\Anaconda2\lib\site-packages\pytesseract\pytesseract.py", line
> 94, in run_tesseract
>     proc = subprocess.Popen(command,
> 
>   File "C:\Program Files (x86)\Anaconda2\lib\subprocess.py", line 711,
> in __init__
>     errread, errwrite)
> 
>   File "C:\Program Files (x86)\Anaconda2\lib\subprocess.py", line 959,
> in _execute_child
>     startupinfo)
> 
> WindowsError: [Error 2] The system cannot find the file specified

我已经尝试了所有我能找到的东西。我在Windows和conda上找不到发行版所以我手动将pytesser提取到Anaconda2 \ Lib,修改 init.py 指向tesseract 3.02安装 它给出了同样的错误。然后我尝试了通过

找到的pytesseract
>pip install pytesseract

系统变量TESSDATA_PREFIX和指针变量image_to_string正确指向:

> C:\Program Files (x86)\Tesseract-OCR

我无法弄清楚哪个地址引用有误。

编辑:print command上显示相同的错误:

  File "C:\Program Files (x86)\Anaconda2\lib\site-packages\pytesseract\pytesseract.py", line 94, in run_tesseract
    print command

  File "C:\Program Files (x86)\Anaconda2\lib\subprocess.py", line 711, in __init__
    errread,

  File "C:\Program Files (x86)\Anaconda2\lib\subprocess.py", line 959, in _execute_child
    env,

WindowsError: [Error 2] The system cannot find the file specified

command对象在下面的函数中定义。添加到值检查的print语句在错误发生之前未显示在控制台中,并且错误在if config:传播

    def run_tesseract(input_filename, output_filename_base, lang=None, boxes=False, config=None):
    '''
    runs the command:
        `tesseract_cmd` `input_filename` `output_filename_base`

    returns the exit status of tesseract, as well as tesseract's stderr output

    '''
    print tesseract_cmd
    print input_filename
    print output_filename_base
    command = [tesseract_cmd, input_filename, output_filename_base]

    print config
    if lang is not None:
        command += ['-l', lang]

    if boxes:
        command += ['batch.nochop', 'makebox']

    if config:
        command += shlex.split(config)

    print command
    proc = subprocess.Popen(command,
            stderr=subprocess.PIPE)
    return (proc.wait(), proc.stderr.read())

1 个答案:

答案 0 :(得分:-2)

我遇到了同样的问题,这就是我所做的:

Console