Question

我的目标是检测此类图像上的字符。

我需要改善图像，以便Tesseract可以更好地识别，这可能需要执行以下步骤：

旋转图像，使蓝色矩形为水平[对此需要帮助]
根据蓝色矩形裁剪图像[对此需要帮助]
应用阈值滤波器和高斯模糊

使用Tesseract检测字符

img = Image.open('grid.jpg')
image = np.array(img.convert("RGB"))[:, :, ::-1].copy()


# Need to rotate the image here and fill the blanks
# Need to crop the image here

# Gray  the image
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

# Otsu's thresholding
ret3, th3 = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)

# Gaussian Blur
blur = cv2.GaussianBlur(th3, (5, 5), 0)

# Save the image
cv2.imwrite("preproccessed.jpg", blur)

# Apply the OCR
pytesseract.pytesseract.tesseract_cmd = r'C:/Program Files (x86)/Tesseract-OCR/tesseract.exe'
tessdata_dir_config = r'--tessdata-dir "C:/Program Files (x86)/Tesseract-OCR/tessdata" --psm 6'

preprocessed = Image.open('preproccessed.jpg')
boxes = pytesseract.image_to_data(preprocessed, config=tessdata_dir_config)

这是我得到的输出图像，它对OCR并不完美：

OCR问题：

蓝色矩形有时会被识别为字符，这就是为什么我要裁剪图像的原因
有时Tesseract会将行中的字符识别为单词（GCVDRTEUQCEBURSIDEEC），有时将其识别为单个字母。我希望它永远是一个词。
右下角的小金字塔被识别为角色

欢迎其他任何提高识别度的建议

Answer 1

这是进行下一步工作的一个主意...

转换为HSV，然后从每个角开始并朝图片的中间进行，以寻找到每个角最近的像素，该像素有些饱和，并且其色相与周围的蓝色矩形相匹配。这将为您标记为红色的4点：

现在使用透视变换将这些点中的每一个移到角落以使图像直线化。我使用了ImageMagick，但是您应该能够看到我将坐标（210,51）的左上红点转换为新图像的（0,0）的左上角。同样，位于（1754,19）的右上角红点将移至（2064,0）。终端中的ImageMagick命令为：

convert wordsearch.jpg \
  -distort perspective '210,51,0,0 1754,19,2064,0 238,1137,0,1161 1776,1107,2064,1161' result.jpg

结果是：

下一个问题是光线不均匀-即左下角比其余图像暗。为了弥补这一点，我克隆了图像并对其进行模糊处理以消除高频（只是框模糊或框平均就可以了），因此它现在代表了缓慢变化的照明。然后，我从中减去图像，以便有效去除背景变化，只保留高频内容，例如您的字母。然后，我对结果进行归一化，以使白色变为白色，黑色变为黑色，阈值为50％。

convert result.jpg -colorspace gray \( +clone -blur 50x50 \) \
   -compose difference -composite  -negate -normalize -threshold 50% final.jpg

如果您知道字体和字母，那么结果对于模板匹配应该是好的，如果您不知道，那么结果对于OCR是好的。

Answer 2

这是我识别字符的步骤：

(1) detect the blue in hsv space, approx the inner blur contour and sort the corner points:
(2) find persprctive transform matrix and do perspective transform
(3) threshold it (and find characters)
(4) use `mnist` algorithms to recognize the chars

step (1) find the corners of the blur rect

Choosing the correct upper and lower HSV boundaries for color detection with`cv::inRange` (OpenCV)

step (2) crop

step (3) threshold (and find the chars)

step (4) on working...

Answer 3

这里使用pyvips的方法稍有不同。

如果图像只是旋转（即很少或没有透视图），则可以使用FFT来找到旋转角度。美观，规则的字符网格将在转换中产生清晰的线条集。它应该非常健壮。这是对整个图像进行FFT，但如果要提高速度，可以先缩小一点。

import sys
import pyvips

image = pyvips.Image.new_from_file(sys.argv[1])

# to monochrome, take the fft, wrap the origin to the centre, get magnitude
fft = image.colourspace('b-w').fwfft().wrap().abs()

制作：

要找到直线的角度，请从极坐标转到直角坐标并寻找水平线：

def to_rectangular(image):
    xy = pyvips.Image.xyz(image.width, image.height)
    xy *= [1, 360.0 / image.height]
    index = xy.rect()
    scale = min(image.width, image.height) / float(image.width)
    index *= scale / 2.0
    index += [image.width / 2.0, image.height / 2.0]
    return image.mapim(index)

# sum of columns, sum of rows
cols, rows = to_rectangular(fft).project()

制作：

预计：

然后只寻找峰值并旋转：

# blur the rows projection a bit, then get the maxpos
v, x, y = rows.gaussblur(10).maxpos()

# and turn to an angle in degrees we should counter-rotate by
angle = 270 - 360 * y / rows.height

image = image.rotate(angle)

要进行裁剪，我再次进行了水平和垂直投影，然后使用B> G搜索峰。

cols, rows = image.project() 

h = (cols[2] - cols[1]) > 10000
v = (rows[2] - rows[1]) > 10000

# search in from the edges for the first non-zero value
cols, rows = h.profile()
left = rows.avg()

cols, rows = h.fliphor().profile()
right = h.width - rows.avg()
width = right - left

cols, rows = v.profile()
top = cols.avg()

cols, rows = v.flipver().profile()
bottom = v.height - cols.avg()
height = bottom - top

# move the crop in by a margin
margin = 10
left += margin
top += margin
width -= 2 * margin
height -= 2 * margin

# and crop!
image = image.crop(left, top, width, height)

制作：

最后要去除背景，以大半径模糊并减去：

image = image.colourspace('b-w').gaussblur(70) - image

制作：

Answer 4

我认为最好去除颜色而不是裁剪。

可以使用opencv完成，请参见：Devserver

改善图片以检测区域内的字符

4 个答案: