Pytesseract 或 Keras OCR 从图像中提取文本

时间:2021-05-18 11:49:21

标签: keras deep-learning ocr tesseract python-tesseract

我正在尝试从图像中提取文本。目前我得到空字符串作为输出。下面是我的 pytesseract 代码,尽管我也对 Keras OCR 持开放态度:-

from PIL import Image
import pytesseract

path = 'captcha.svg.png'
img = Image.open(path)
captchaText = pytesseract.image_to_string(img, lang='eng', config='--psm 6')

我不确定如何使用 svg 图像,所以我将它们转换为 png。下面是一些示例图片:-

SVG image converted to PNG

enter image description here

enter image description here

enter image description here

enter image description here

编辑 1 (2021-05-19): 我可以使用 cairosvg 将 svg 转换为 png。仍然无法阅读验证码文本

Edit 2 (2021-05-20): Keras OCR 也不会为这些图像返回任何内容

1 个答案:

答案 0 :(得分:0)

keras-ocr 不工作或不返回任何内容的原因是灰度图像(因为我发现它可以正常工作)。见下文:

from PIL import Image 

a = Image.open('/content/gD7vA.png') # return none by keras-ocr, 
a.mode, a.split() # mode 1 channel + transparent layer / alpha layer (LA)

b = Image.open('/content/CYegU.png') # return result by keras-ocr
b.mode, b.split() # mode RGB + transparent layer / alpha layer (RGBA)

在上面,a 是您在问题中提到的文件;正如它所示,它必须引导,例如灰度和透明层。 b 是我转换为 RGBRGBA 的文件。透明层已经包含在您的原始文件中,我没有删除它,但如果需要,保留其他方式似乎没有用。简而言之,要使您的输入在 keras-ocr 上工作,您可以先将文件转换为 RGB(或 RGBA)并将它们保存在磁盘上。然后将它们传递给 ocr。

# Using PIL to convert one mode to another 
# and save on disk
c = Image.open('/content/gD7vA.png').convert('RGBA')
c.save(....png)
c.mode, c.split()

('RGBA',
 (<PIL.Image.Image image mode=L size=150x50 at 0x7F03E8E7A410>,
  <PIL.Image.Image image mode=L size=150x50 at 0x7F03E8E7A590>,
  <PIL.Image.Image image mode=L size=150x50 at 0x7F03E8E7A810>,
  <PIL.Image.Image image mode=L size=150x50 at 0x7F03E8E7A110>))

完整代码

import matplotlib.pyplot as plt

# keras-ocr will automatically download pretrained
# weights for the detector and recognizer.
pipeline = keras_ocr.pipeline.Pipeline()

# Get a set of three example images
images = [
         keras_ocr.tools.read(url) for url in [
            '/content/CYegU.png', # mode: RGBA; Only RGB should work too!
            '/content/bw6Eq.png', # mode: RGBA; 
            '/content/jH2QS.png', # mode: RGBA
            '/content/xbADG.png'  # mode: RGBA
    ]
]

# Each list of predictions in prediction_groups is a list of
# (word, box) tuples.
prediction_groups = pipeline.recognize(images)
Looking for /root/.keras-ocr/craft_mlt_25k.h5
Looking for /root/.keras-ocr/crnn_kurapan.h5
prediction_groups
[[('zum', array([[ 10.658852,  15.11916 ],
          [148.90204 ,  13.144257],
          [149.39563 ,  47.694347],
          [ 11.152428,  49.66925 ]], dtype=float32))],
 [('sresa', array([[  5.,  15.],
          [143.,  15.],
          [143.,  48.],
          [  5.,  48.]], dtype=float32))],
 [('sycw', array([[ 10.,  15.],
          [149.,  15.],
          [149.,  49.],
          [ 10.,  49.]], dtype=float32))],
 [('vdivize', array([[ 10.407883,  13.685192],
          [140.62648 ,  16.940662],
          [139.82323 ,  49.070583],
          [  9.604624,  45.815113]], dtype=float32))]]

显示

# Plot the predictions
fig, axs = plt.subplots(nrows=len(images), figsize=(20, 20))
for ax, image, predictions in zip(axs, images, prediction_groups):
    keras_ocr.tools.drawAnnotations(image=image, predictions=predictions, ax=ax)

enter image description here