我正在尝试通过python使用OpenCV从图像中提取文本,但得到的结果不正确,并且大多使用特殊字符,请在此处更正错误
import cv2
import numpy as np
import pytesseract
from PIL import Image
import os
def get_string(img_path):
# Read image with opencv
img = cv2.imread(img_path)
# Convert to gray
img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# Apply dilation and erosion to remove some noise
kernel = np.ones((1, 1), np.uint8)
img = cv2.dilate(img, kernel, iterations=1)
img = cv2.erode(img, kernel, iterations=1)
# Write image after removed noise
cv2.imwrite(src_path + "removed_noise.png", img)
# Write the image after apply opencv to do some ...
cv2.imwrite(src_path + "thres.png", img)
# Recognize text with tesseract for python
result = pytesseract.image_to_string(Image.open(src_path + "thres.png"))
return result
print('--- Start recognize text from image ---')
print(get_string("image_full_path.png"))
尝试了谷歌搜索,但没有任何帮助。可以一些指向正确的代码 输出:
i } i er Oe a Pee pe be a
i j rye Se) PEE eet et ae ec?
j } a « o cy ” a @
: i : } Cand RET RE Petr eet PI ret
nif wad
fs | : : } wert
| ; a] |
wee | a
— th | cE i
ae | i
“ oe i j EYE }
en ct
. a f ae " i
- — ; - i! }
答案 0 :(得分:0)
图像太亮,对比度不理想。
您需要改进contrast and brightness。
之后,应用一些morphological operations来消除噪声。
然后遵循this tips,了解如何为OCR改进点矩阵打印机字体: