代码: Pytesseract:水平(逐行)而不是垂直(逐列)读取pdf列
import pytesseract
imgPath = '1.pdf'
text = pytesseract.image_to_string(pilImg.open(imgPath), 'eng')
text = ''.join(filter(lambda x: ord(x) < 128, text))
newalines = (repr(text))
alines = text.split('\n')
print(alines)