在Python上使用Tesseract-OCR的问题

时间:2018-06-07 05:30:37

标签: python anaconda tesseract

我是编程新手,我正在尝试使用Tesseract OCR来读取图像文本,但我无法使其正常工作!我在我的环境中安装了tesseract_OCR,pytesseract和枕头。有人有小费吗?

输入:

from PIL import Image 

import pytesseract

print( pytesseract.image_to_string( Image.open('phrase.jpg') ) ) 

输出:

 C:\Anaconda2\envs\ambiente36\python.exe 

 C:/Users/Simone/Desktop/curso_programacao/Ler_imagens/ler_imagens

Traceback (most recent call last):

File "C:\Anaconda2\envs\ambiente36\lib\site- 
packages\pytesseract\pytesseract.py", line 194, in run_and_get_output
run_tesseract(**kwargs)

File "C:\Anaconda2\envs\ambiente36\lib\site- 
packages\pytesseract\pytesseract.py", line 165, in run_tesseract
proc = subprocess.Popen(command, **subprocess_args())

File "C:\Anaconda2\envs\ambiente36\lib\subprocess.py", line 709, in __init__
restore_signals, start_new_session)

File "C:\Anaconda2\envs\ambiente36\lib\subprocess.py", line 997, in 
_execute_child 
startupinfo)

FileNotFoundError: [WinError 2] O sistema não pode encontrar o arquivo 
especificado

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "C:/Users/Simone/Desktop/curso_programacao/Ler_imagens/ler_imagens", 
line 6, in <module>
phrase = pytesseract.image_to_string(Image.open('phrase.jpg'))

File "C:\Anaconda2\envs\ambiente36\lib\site- 
packages\pytesseract\pytesseract.py", line 286, in image_to_string
return run_and_get_output(image, 'txt', lang, config, nice)

File "C:\Anaconda2\envs\ambiente36\lib\site- 
packages\pytesseract\pytesseract.py", line 201, in run_and_get_output
raise TesseractNotFoundError()

pytesseract.pytesseract.TesseractNotFoundError: tesseract is not installed 
or it's not in your path

3 个答案:

答案 0 :(得分:1)

您应该遵循的步骤在您的环境中配置tessaract 以下是您应该遵循的步骤

首先安装python和pip here是步骤 然后安装枕头,pytesseract为here

from PIL import Image
from pytesser.pytesser import *

image_file = "FULL/PATH/TO/YOUR/IMAGE/image.png"
im = Image.open(image_file)
text = image_to_string(im)
text = image_file_to_string(image_file)
text = image_file_to_string(image_file, graceful_errors=True)
print "=====output=======\n"
print text

download pytessaract的链接 你可以找到一个完整的例子here

答案 1 :(得分:0)

似乎没有正确安装Tesseract,或者tesseract的路径没有指出tesseract实际安装的位置。

  

pytesseract.pytesseract.TesseractNotFoundError:tesseract不是   安装或它不在您的路径

我建议您先按照the official documentation检查您的安装。

我最近写了一本非常简单的Tesseract指南,但是它应该让你能够编写你的第一个OCR脚本,并清除我在文档中不太清楚时遇到的一些障碍。

如果您想查看它们,我在这里与您分享链接:

答案 2 :(得分:0)

您需要使用here可用的Windows Installer安装tesseract。然后,您应将python包装器安装为:

pip install pytesseract

然后,在导入pytesseract库之后,还应在脚本中设置tesseract路径,如下所示(请注意,安装路径可能会因您的情况而被修改!):

pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files (x86)\Tesseract-OCR\tesseract.exe'

注意:它已经在Anaconda3上进行了测试,没有任何问题。