来自tesseract的怪异角色

时间:2017-08-02 22:19:49

标签: python ocr tesseract pillow

当我在Windows 10上运行此python代码时:

from PIL import Image
import pytesseract
pytesseract.pytesseract.tesseract_cmd = 'C:/Program Files (x86)/Tesseract-OCR/tesseract'
im = Image.open("news.png")

text = pytesseract.image_to_string(im, lang='eng')

print(text.encode("utf-8"))

On this image.

我打印出来了:

    THE MN HERALD gm:-                                 

} HEAT INVASIUN SIAHIHI                            

Builefins Gen. Eisenhower Bares Landings in France

dtmmvΓÇÿ By Both Sea, Air from Le Havre North;     

MSW \ Fierce Fight at Caen, Dunlaerque Rocked      

EM?ΓÇ¥ 7 , w W! Paratroops Hit Nazis               
'k AWMWM In Pre-Dawn Assaults                      

swam HEADQUARTERSΓÇÿ ALLIED EXPE                   
mmmkv poms. Im a (APIΓÇöΓÇÿAmcrfwl.                
mm m: Canadian lmvm'umlldm mum anu:                
ma mm, launrmn: m. mm: wwmmmxlfmy                  
aprn m hum mm mm mm m" supra"!                     
mummy, c". 1mm 17' 5mmΓÇ¥, m... ΓÇ£we              
m mm mm┬╗; mm mu ΓÇ£mm nun m Gar                   
man mmlm n┬╗ m: can                                

u. mm, which                                       
mm" m nnmmntl                                      
(mum                                               
(ammunlaur N. 1;                                   
r m: cwmand 07 Cm. Elunhown MM                     
m: Yam: :unpurltd by gm, m┬╗ Mm. mm.               
(mm, Mind mm m, mm. m m mum                        
» man «2mm                                       

.m and ΓÇ£a [um                                    
72 A M. Cvunmfch                                   
mwmnmummmn.ΓÇÖ                                    

为什么我得到这个奇怪的输出的任何想法,这是tesseract的限制还是我愚蠢?

谢谢,Ed。

0 个答案:

没有答案