Question

一种标签检测工具，可根据设备编号自动识别图像并按字母顺序对图像进行排序（19-V1083AI）。确定设备标签的轮廓后，我使用pytesseract库将图像转换为字符串。尽管代码可以正确运行，但它永远不会输出设备编号。这是我第一次使用pytesseract库和goodFeaturesToTrack函数。任何帮助将不胜感激！

Original Image

import numpy as np
import cv2
import imutils #resizeimage
import pytesseract # convert img to string
from matplotlib import pyplot as plt
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"

# Read the image file
image = cv2.imread('Car Images/s3.JPG')

# Resize the image - change width to 500
image = imutils.resize(image, width=500)


# Display the original image
cv2.imshow("Original Image", image)
cv2.waitKey(0)

# RGB to Gray scale conversion
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
cv2.imshow("1 - Grayscale Conversion", gray)
cv2.waitKey(0)

# Noise removal with iterative bilateral filter(removes noise while preserving edges)
gray = cv2.bilateralFilter(gray, 11, 17, 17)
cv2.imshow("2 - Bilateral Filter", gray)
cv2.waitKey(0)


corners = cv2.goodFeaturesToTrack(gray,60,0.001,10)

corners = np.int0(corners)

for i in corners:
    x,y = i.ravel()
    cv2.circle(image,(x,y),0,255,-1)
    coord = np.where(np.all(image == (255, 0, 0),axis=-1))
plt.imshow(image)

# Use tesseract to covert image into string
text = pytesseract.image_to_string(image, lang='eng')
print("Equipment Number is:", text)


plt.show()

输出图像2

注意：它只能处理其中一张图片，但不适用于其他图片 Output Image2

Answer 1

我发现使用PyTesseract的特定配置选项会找到您的文本-以及一些杂音。以下是说明的配置选项：https://stackoverflow.com/a/44632770/42346

对于该任务，我选择：“稀疏文本。以特定顺序找到尽可能多的文本。”

由于PyTesseract返回了更多的“文本”，因此您可以使用正则表达式过滤掉噪音。

此特定的正则表达式查找两位数字，一个连字符，五个数字或字符，一个空格，然后是两个数字或字符。可以根据需要将其调整为您的设备编号格式，但是我有理由相信这是一个很好的解决方案，因为返回的文本中没有其他类似的设备编号。

import re
import cv2
import pytesseract

image = cv2.imread('Fv0oe.jpg') 
text = pytesseract.image_to_string(image, lang='eng', config='--psm 11') 

for line in text.split('\n'): 
     if re.match(r'^\d{2}-\w{5} \w{2}$',line): 
         print(line)

结果（无需图像处理）

19-V1083 AI

使用PyTesseract进行标签的文本检测

1 个答案: