应用错误收集

Pytesseract：水平（逐行）而不是垂直（逐列）读取pdf列

时间：2019-05-06 14:33:09

标签： python ocr python-tesseract

代码： Pytesseract：水平（逐行）而不是垂直（逐列）读取pdf列

import pytesseract
   imgPath = '1.pdf'
   text = pytesseract.image_to_string(pilImg.open(imgPath), 'eng')
   text = ''.join(filter(lambda x: ord(x) < 128, text))
   newalines = (repr(text))
   alines = text.split('\n')
   print(alines)

0 个答案:

没有答案