文本二值化

时间:2014-10-03 19:31:29

标签: python opencv tesseract

我想将此图片二值化: http://imgur.com/A5u9xSA

将它与tesseract-ocr一起使用。目前,我设法得到了这个: http://imgur.com/bU0FSt8

但我需要只有文字的清晰图像,没有黑色背景部分,就像这样: imgur.com/KXQNErM

我目前的代码:

img = cv2.imread(path, 0)
blur = cv2.GaussianBlur(img, (3, 3), 0)
filtered = cv2.adaptiveThreshold(img, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 405, 1)
bitnot = cv2.bitwise_not(filtered)
cv2.imshow('image', bitnot)
cv2.imwrite("h2kcw2/out1.png", bitnot)
cv2.waitKey(0)
cv2.destroyAllWindows()

1 个答案:

答案 0 :(得分:3)

常规阈值可以产生良好的结果:

Result

img = cv2.imread(path, 0)
ret, thresh = cv2.threshold(img, 70, 255, cv2.THRESH_BINARY_INV)
cv2.imshow('image', thresh)
cv2.imwrite("h2kcw2/out1.png", thresh)
cv2.waitKey(0)
cv2.destroyAllWindows()