api = tesseract.TessBaseAPI()
api.SetOutputName("output")
api.Init(".","eng",tesseract.OEM_DEFAULT)
api.SetVariable("tessedit_char_whitelist", "0123456789")
api.SetVariable("tessedit_pageseg_mode", "7")
pixImage = tesseract.pixRead('img.jpg')
api.SetImage(pixImage)
outText = api.GetUTF8Text()
answer = outText.replace("\n", "").replace(" ", "")
print answer
上面的图像,即' 1',被识别为' 47'通过tesseract已经设置为api.SetVariable("tessedit_pageseg_mode", "7")
。如何强制它识别图像中的单个字符?
P.S。:python 2.7.3 + tesseract 3.0.2