从图像中删除颜色以仅保留文本

时间:2019-12-05 08:58:02

标签: python image numpy colors python-imaging-library

如下图所示的白色背景图像,在(红色)红色背景下,一些文本为黑色,一些文本为红色。文本的位置(无论是否带有背景)都不固定。

我只想复制文字图像。

enter image description here

我想到的一种方法是将红色背景替换为白色,但是红色文本也不可避免地消失了。

这是我尝试过的:

from PIL import Image

import numpy as np

orig_color = (255,0,0)
replacement_color = (255,255,255)
img = Image.open("C:\\TEM\\AB.png").convert('RGB')
data = np.array(img)
data[(data == orig_color).all(axis = -1)] = replacement_color
img2 = Image.fromarray(data, mode='RGB')
img2.show()

结果如下:

enter image description here

仅保留图片的所有文字的最佳方法是什么? (理想情况如下)

谢谢。

enter image description here

1 个答案:

答案 0 :(得分:1)

这是我仅使用图像的红色和绿色通道的方法(使用OpenCV,有关解释请参见代码中的注释):

import cv2
import imageio
import numpy as np

# extract red and green channel from the image
r, g = cv2.split(imageio.imread('https://i.stack.imgur.com/bMSzZ.png'))[:2]

imageio.imsave('r-channel.png', r)
imageio.imsave('g-channel.png', g)

# white image as canvas for drawing contours
canvas = np.ones(r.shape, np.uint8) * 255

# find contours in the inverted green channel 
# change [0] to [1] when using OpenCV 3, in which contours are returned secondly
contours = cv2.findContours(255 - g, cv2.RETR_LIST, cv2.CHAIN_APPROX_SIMPLE)[0]

# filter out contours that are too large and have length 4 (rectangular)
contours = [
    cnt for cnt in contours
    if cv2.contourArea(cnt) <= 500 and len(cnt) == 4
]

# fill kept contours with black on the canvas
cv2.drawContours(canvas, contours, -1, 0, -1)

imageio.imsave('filtered-contours.png', canvas)

# combine kept contours with red channel using '&' to bring back the "AAA"
# use '|' with the green channel to remove contour edges around the "BBB"
result = canvas & r | g

imageio.imsave('result.png', result)

r-channel.png

r-channel.png

g-channel.png

g-channel.png

filtered-contours.png

filtered-contours.png

result.png

result.png


更新

这是基于您在chat中提供的另一个示例图像的更通用的解决方案:

import cv2
import numpy as np

img = cv2.imread('example.png')

result = np.ones(img.shape[:2], np.uint8) * 255
for channel in cv2.split(img):
    canvas = np.ones(img.shape[:2], np.uint8) * 255
    contours = cv2.findContours(255 - channel, cv2.RETR_LIST,
                                cv2.CHAIN_APPROX_SIMPLE)[0]
    # size threshold may vary per image
    contours = [cnt for cnt in contours if cv2.contourArea(cnt) <= 100]
    cv2.drawContours(canvas, contours, -1, 0, -1)
    result = result & (canvas | channel)

cv2.imwrite('result.png', result)

这里我不再过滤轮廓长度,因为这会在其他字符接触矩形时引起问题。图像的所有通道均用于使其与不同的颜色兼容。