Pytesser set character whitelist

时间:2017-04-30 10:35:58

标签: python ocr tesseract pytesser

Does anyone know how to set the character whitelist for Pytesseract? I want it to only output A-z and 0-9. Is this possible? I have the following:

img = Image.open('test.jpg')
result = pytesseract.image_to_string(img, config='-psm 6')

I'm getting other characters like / for a 1 so I would like to limit the options of possible characters.

1 个答案:

答案 0 :(得分:12)

您可以使用以下行完成此操作。或者您可以设置tesseract的配置文件以执行相同的操作Limit characters tesseract is looking for

pytesseract.image_to_string(question_img, config="-c tessedit_char_whitelist=0123456789abcdefghijklmnopqrstuvwxyz -psm 6")

我相信还有其他方法可以让它发挥作用,但这对我有用。