Question

我正在尝试从图像中提取文本。目前我得到空字符串作为输出。下面是我的 pytesseract 代码，尽管我也对 Keras OCR 持开放态度：-

from PIL import Image
import pytesseract

path = 'captcha.svg.png'
img = Image.open(path)
captchaText = pytesseract.image_to_string(img, lang='eng', config='--psm 6')

我不确定如何使用 svg 图像，所以我将它们转换为 png。下面是一些示例图片：-

编辑 1 (2021-05-19)： 我可以使用 cairosvg 将 svg 转换为 png。仍然无法阅读验证码文本

Edit 2 (2021-05-20)： Keras OCR 也不会为这些图像返回任何内容

Answer 1

keras-ocr 不工作或不返回任何内容的原因是灰度图像（因为我发现它可以正常工作）。见下文：

from PIL import Image 

a = Image.open('/content/gD7vA.png') # return none by keras-ocr, 
a.mode, a.split() # mode 1 channel + transparent layer / alpha layer (LA)

b = Image.open('/content/CYegU.png') # return result by keras-ocr
b.mode, b.split() # mode RGB + transparent layer / alpha layer (RGBA)

在上面，a 是您在问题中提到的文件；正如它所示，它必须引导，例如灰度和透明层。 b 是我转换为 RGB 或 RGBA 的文件。透明层已经包含在您的原始文件中，我没有删除它，但如果需要，保留其他方式似乎没有用。简而言之，要使您的输入在 keras-ocr 上工作，您可以先将文件转换为 RGB（或 RGBA）并将它们保存在磁盘上。然后将它们传递给 ocr。

# Using PIL to convert one mode to another 
# and save on disk
c = Image.open('/content/gD7vA.png').convert('RGBA')
c.save(....png)
c.mode, c.split()

('RGBA',
 (<PIL.Image.Image image mode=L size=150x50 at 0x7F03E8E7A410>,
  <PIL.Image.Image image mode=L size=150x50 at 0x7F03E8E7A590>,
  <PIL.Image.Image image mode=L size=150x50 at 0x7F03E8E7A810>,
  <PIL.Image.Image image mode=L size=150x50 at 0x7F03E8E7A110>))

完整代码

import matplotlib.pyplot as plt

# keras-ocr will automatically download pretrained
# weights for the detector and recognizer.
pipeline = keras_ocr.pipeline.Pipeline()

# Get a set of three example images
images = [
         keras_ocr.tools.read(url) for url in [
            '/content/CYegU.png', # mode: RGBA; Only RGB should work too!
            '/content/bw6Eq.png', # mode: RGBA; 
            '/content/jH2QS.png', # mode: RGBA
            '/content/xbADG.png'  # mode: RGBA
    ]
]

# Each list of predictions in prediction_groups is a list of
# (word, box) tuples.
prediction_groups = pipeline.recognize(images)
Looking for /root/.keras-ocr/craft_mlt_25k.h5
Looking for /root/.keras-ocr/crnn_kurapan.h5

prediction_groups
[[('zum', array([[ 10.658852,  15.11916 ],
          [148.90204 ,  13.144257],
          [149.39563 ,  47.694347],
          [ 11.152428,  49.66925 ]], dtype=float32))],
 [('sresa', array([[  5.,  15.],
          [143.,  15.],
          [143.,  48.],
          [  5.,  48.]], dtype=float32))],
 [('sycw', array([[ 10.,  15.],
          [149.,  15.],
          [149.,  49.],
          [ 10.,  49.]], dtype=float32))],
 [('vdivize', array([[ 10.407883,  13.685192],
          [140.62648 ,  16.940662],
          [139.82323 ,  49.070583],
          [  9.604624,  45.815113]], dtype=float32))]]

显示

# Plot the predictions
fig, axs = plt.subplots(nrows=len(images), figsize=(20, 20))
for ax, image, predictions in zip(axs, images, prediction_groups):
    keras_ocr.tools.drawAnnotations(image=image, predictions=predictions, ax=ax)

Pytesseract 或 Keras OCR 从图像中提取文本

1 个答案: