应用错误收集

Tesseract-OCR无法读取jpeg图像文件中的所有字符

时间：2020-02-03 10:14:05

标签： python-3.x tesseract python-tesseract

我正在尝试从pdf文件中读取特定字段，我将它们转换为图像，然后使用opencv和tesseract-ocr从图像中读取，但是一些文本被忽略了，没有被读取，这对我有什么帮助？代码段

filename = "page_"+str(i)+".jpg"
img_cv =  cv2.imread(filename)
custom_config = r'-l eng --oem 3 --psm 6 -c tessedit_char_blacklist=\/'
text =pytesseract.image_to_string(img_cv,config=custom_config)
text = text.replace('-\n', '')
f.write(text)

0 个答案:

没有答案