当我在Windows 10上运行此python代码时:
from PIL import Image
import pytesseract
pytesseract.pytesseract.tesseract_cmd = 'C:/Program Files (x86)/Tesseract-OCR/tesseract'
im = Image.open("news.png")
text = pytesseract.image_to_string(im, lang='eng')
print(text.encode("utf-8"))
我打印出来了:
THE MN HERALD gm:-
} HEAT INVASIUN SIAHIHI
Builefins Gen. Eisenhower Bares Landings in France
dtmmvΓÇÿ By Both Sea, Air from Le Havre North;
MSW \ Fierce Fight at Caen, Dunlaerque Rocked
EM?ΓÇ¥ 7 , w W! Paratroops Hit Nazis
'k AWMWM In Pre-Dawn Assaults
swam HEADQUARTERSΓÇÿ ALLIED EXPE
mmmkv poms. Im a (APIΓÇöΓÇÿAmcrfwl.
mm m: Canadian lmvm'umlldm mum anu:
ma mm, launrmn: m. mm: wwmmmxlfmy
aprn m hum mm mm mm m" supra"!
mummy, c". 1mm 17' 5mmΓÇ¥, m... ΓÇ£we
m mm mm┬╗; mm mu ΓÇ£mm nun m Gar
man mmlm n┬╗ m: can
u. mm, which
mm" m nnmmntl
(mum
(ammunlaur N. 1;
r m: cwmand 07 Cm. Elunhown MM
m: Yam: :unpurltd by gm, m┬╗ Mm. mm.
(mm, Mind mm m, mm. m m mum
» man «2mm
.m and ΓÇ£a [um
72 A M. Cvunmfch
mwmnmummmn.ΓÇÖ
为什么我得到这个奇怪的输出的任何想法,这是tesseract的限制还是我愚蠢?
谢谢,Ed。