我有一个多页.tif
文件,我需要从中提取文本。我试图应用Gaussian blur
来提高其质量,然后应用Tesseract OCR
来提取文本。应用高斯模糊时出现错误
TypeError:不支持src数据类型= 0
代码
from PIL import Image, ImageSequence
from tesserocr import PyTessBaseAPI
import numpy as np
import pycountry
import cv2
with PyTessBaseAPI() as api:
img = Image.open('sample.tif')
for i, page in enumerate(ImageSequence.Iterator(img)):
page2 = np.asarray(page)
# Gaussian Blur
imgG = cv2.GaussianBlur(page2, (5,5), 0) # <---- ERROR
# Tesseract OCR
api.SetImage(imgG)
text = api.GetUTF8Text()