Question

我的图像中有一个奇怪的输出：所有字符都以灰色像素为界。我确信90％是因为OpenCV-PIL转换问题，但我不知道如何解决它。

以下是源图片：

输出（你需要缩放以查看灰色像素..）

这里有一个细节..

这是我正在使用的代码：

import cv2
import tesserocr as tr
from PIL import Image
import os

src = (os.path.expanduser('~\\Desktop\\output4\\'))

causali = os.listdir(src)  # CREO LISTA CAUSALI
causali.sort(key=lambda x: int(x.split('.')[0]))

for file in enumerate(causali):  # CONTA NUMERO DI FILE CAUSALE

    cv_img = cv2.imread(os.path.expanduser('~\\Desktop\\output4\\{}'.format(file[1])), cv2.IMREAD_UNCHANGED)

    # since tesserocr accepts PIL images, converting opencv image to pil
    pil_img = Image.fromarray(cv2.cvtColor(cv_img, cv2.COLOR_BGR2RGB))

    # initialize api
    api = tr.PyTessBaseAPI()
    try:
        # set pil image for ocr
        api.SetImage(pil_img)
        # Google tesseract-ocr has a page segmentation method(psm) option for specifying ocr types
        # psm values can be: block of text, single text line, single word, single character etc.
        # api.GetComponentImages method exposes this functionality
        # function returns:
        # image (:class:`PIL.Image`): Image object.
        # bounding box (dict): dict with x, y, w, h keys.
        # block id (int): textline block id (if blockids is ``True``). ``None`` otherwise.
        # paragraph id (int): textline paragraph id within its block (if paraids is True).
        # ``None`` otherwise.
        boxes = api.GetComponentImages(tr.RIL.BLOCK, True)
        # get text
        text = api.GetUTF8Text()
        # iterate over returned list, draw rectangles
        for (im, box, _, _) in boxes:
            x, y, w, h = box['x'], box['y'], box['w'], box['h']

            cv_rect = cv2.rectangle(cv_img, (x-10, y-10), (x + w+10, y + h+10), color=(255, 255, 255), thickness=1)

            im.save(os.path.expanduser('~\\Desktop\\output5\\{}.png').format(file[0]))

    finally:
        api.End()

有没有办法接受api.SetImage() opencv变量？

由于

编辑：有没有办法通过提供颜色来删除所有灰色像素？

Answer 1

您需要使用二进制阈值算法来滤除＆＃34;噪声＆＃34;在你的形象。

C++ docs

Python docs

Answer 2

所以，这是我的解决方案。找到一种方法来使用OpenCV而不是PIL，只要第一个不在此过程中将图像转换为JPEG。我们将从输入到输出都有一个干净的图像。

以下是代码：

import cv2
import tesserocr as tr
from PIL import Image
import os

cv_img = cv2.imread('C:\\Users\\Link\\Desktop\\0.png', cv2.IMREAD_UNCHANGED)

idx = 0

# since tesserocr accepts PIL images, converting opencv image to pil
pil_img = Image.fromarray(cv_img)

# initialize api
api = tr.PyTessBaseAPI()
try:
    # set pil image for ocr
    api.SetImage(pil_img)
    # Google tesseract-ocr has a page segmentation method(psm) option for specifying ocr types
    # psm values can be: block of text, single text line, single word, single character etc.
    # api.GetComponentImages method exposes this functionality
    # function returns:
    # image (:class:`PIL.Image`): Image object.
    # bounding box (dict): dict with x, y, w, h keys.
    # block id (int): textline block id (if blockids is ``True``). ``None`` otherwise.
    # paragraph id (int): textline paragraph id within its block (if paraids is True).
    # ``None`` otherwise.
    boxes = api.GetComponentImages(tr.RIL.TEXTLINE, True)
    # get text
    text = api.GetUTF8Text()
    # iterate over returned list, draw rectangles
    for (im, box, _, _) in boxes:

        x, y, w, h = box['x'], box['y'], box['w'], box['h']

        cv_rect = cv2.rectangle(cv_img, (x-10, y-10), (x + w+10, y + h+10), color=(255, 255, 255), thickness=1)

        roi = cv_rect[y:y + h, x:x + w]

        cv2.imwrite(os.path.expanduser('~\\Desktop\\output5\\image.png'), roi)

finally:
    api.End()

PIL在OpenCV图像中生成灰色像素

2 个答案: