我是tesseract OCR的新手,我有一堆工资单图像,我想自动从工资单中提取日期,请帮助我怎么做,
首先,我试图从一张工资单中提取数据,它显示错误:
import cv2
import pytesseract
img = cv2.imread(r'E:/Receipts/Receipts/0a0ebd53.jpeg')
pytesseract.pytesseract.tesseract_cmd = 'C:/Program Files/Tesseract-OCR/tesseract.exe'
TESSDATA_PREFIX='C:/Program Files/Tesseract-OCR/tessdata'
print(pytesseract.image_to_string(img))
# OR explicit beforehand converting
print(pytesseract.image_to_string(Image.fromarray(img)))
错误:
200 }
201
--> 202 run_tesseract(**kwargs)
203 filename = kwargs['output_filename_base'] + os.extsep + extension
204 with open(filename, 'rb') as output_file:
~\Anaconda3\lib\site-packages\pytesseract\pytesseract.py in run_tesseract(input_filename, output_filename_base, extension, lang, config, nice)
176
177 if status_code:
--> 178 raise TesseractError(status_code, get_errors(error_string))
179
180 return True
TesseractError: (1, 'Error opening data file C:\\Program Files (x86)\\Tesseract-OCR\\eng.traineddata Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory. Failed loading language \'eng\' Tesseract couldn\'t load any languages! Could not initialize tesseract.')
请帮助我解决此错误,还请提供深度学习模型建议。
答案 0 :(得分:0)
请阅读带有PIL库的图像,然后将图像对象传递给image_to_string(img_obj),如下所示。
from PIL import Image
import pytesseract
pytesseract.pytesseract.tesseract_cmd = r"C:/Program Files/TesseractOCR/tesseract.exe"
image_obj = Image.open(image_path)
print(pytesseract.image_to_string(image_obj))