Question

我想从下面的图像中删除颜色，由于这种颜色，我无法从图像中清楚地提取文本。

我正在使用下面的代码，但是我没有得到清晰的文字，

import numpy as np
from PIL import Image

im = Image.open('my_file.tif')
im = im.convert('RGBA')
data = np.array(im)
# just use the rgb values for comparison
rgb = data[:,:,:3]
color = [246, 213, 139]   # Original value
black = [0,0,0, 255]
white = [255,255,255,255]
mask = np.all(rgb == color, axis = -1)
# change all pixels that match color to white
data[mask] = white

# change all pixels that don't match color to black
##data[np.logical_not(mask)] = black
new_im = Image.fromarray(data)
new_im.save('new_file.tif')

和

def black_and_white(input_image_path,
                output_image_path):
color_image = Image.open(input_image_path)
bw = color_image.convert('L')
bw.save(output_image_path)

请帮助我...

图片2：

Answer 1

我假设您要提取报价。为此，您可以执行一系列过滤操作以删除非文本轮廓。获得处理结果后，可以使用Pytesseract等OCR工具提取文本。

OCR的结果

On behalf of the hundreds of ACLU activists who
called on Governor Walker to veto House Bill
156, we are disappointed that he did not put
students or the Constitution first today.”
—Joshua A. Decker
Executive Director

代码

import cv2
import pytesseract

pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"

# Load image and threshold
image = cv2.imread('1.png')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]

# Connect text with a horizontal shaped kernel
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (10,3))
dilate = cv2.dilate(thresh, kernel, iterations=3)

# Remove non-text contours using aspect ratio filtering
cnts = cv2.findContours(dilate, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
for c in cnts:
    x,y,w,h = cv2.boundingRect(c)
    aspect = w/h
    if aspect < 3:
        cv2.drawContours(thresh, [c], -1, (0,0,0), -1)

# Invert image and OCR
result = 255 - thresh
data = pytesseract.image_to_string(result, lang='eng',config='--psm 6')
print(data)

cv2.imshow('result', result)
cv2.waitKey()

Answer 2

尝试进行OpenCV转换，但请记住使用3个通道，否则会出错

gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

去除图像的颜色

2 个答案: