tesseract Python:如何查找哪个字母属于哪个单词?

时间:2018-08-18 14:02:10

标签: tesseract python-tesseract

我有以下tesserocr代码,可以在我的“ hello world”黑白简单图像中显示每个字母。但是,如何“映射”哪个字母属于哪个单词?我正在寻找RIL.SYMBOL级别的字母索引与RIL.WORD级别的单词索引之间的“链接” ...(我当然可以自己做,遍历所有单词和符号,但也许它已经由tess完成了?)

from PIL import Image
import locale
locale.setlocale(locale.LC_ALL, 'C')
import tesserocr
from tesserocr import PyTessBaseAPI, RIL, PSM

print(tesserocr.tesseract_version())

image = Image.open('helloworld.jpg')

# https://stackoverflow.com/questions/41384732/how-do-i-use-the-tesseract-api-to-iterate-over-words
with PyTessBaseAPI() as api:
    api.SetImage(image)
    api.Recognize()
    api.SetVariable("save_blob_choices","T")
    ri=api.GetIterator()
    level=RIL.SYMBOL
    for r in tesserocr.iterate_level(ri, level):
        symbol = r.GetUTF8Text(level)
        conf = r.Confidence(level)
        bbox = r.BoundingBoxInternal(level)
        print(level, symbol, conf, bbox)

0 个答案:

没有答案