如何使用scrapy在python中使用文本从图像中获取文本值?

时间:2017-09-08 10:17:35

标签: python-2.7 scrapy

此外,我将在数据中查看数据:

Academic Title   ---------  Clinical Fellow in Surgery
Department ---------  Surgery-Brigham and Women's Hospital
Institution---------  Brigham and Women's Hospital
Address    --------- Brigham and Womens Hospital
           --------- c/o Surgery Education
           ---------  75 Francis St
           ---------  Boston, MA 02115
Phone      --------- 617/732-6861
Email      --------- email as image

我在这里放置代码:在最后一个div中如何从图像中提取电子邮件文本......在网站电子邮件中显示为图像,无法复制或点击。请告诉我如何在scrapy中的python 2.7.13中提取它?

<div class="person-line">
  <span>Department</span>
  <div>Surgery-Brigham and Women's Hospital</div>
</div>

<div class="person-line">
  <span>Institution</span>
  <div>Brigham and Women's Hospital</div>
</div>

<div class="person-line">
 <span>Address</span>
 <div>
  Brigham and Womens Hospital<br/>      c/o Surgery Education<br/>      75 Francis St  <br/>      Boston, MA 02115<br/>    
 </div>  
</div>
<div class="person-line">
  <span>Phone</span>
  <div>617/732-6861</div>
</div>

<div class="person-line">
  <span>Email</span>
  <div>
   <img src="/sites/default/files/hms-faculty-emails/BX0UVXkP.jpg" />
  </div>
</div>

1 个答案:

答案 0 :(得分:0)

首先你应该得到src:

src = response.css('div.person-line > div > img::attr("src")').extract_first()

然后您可以使用pytesseract

从图像中获取文字
import pytesseract
from PIL import Image,

email = pytesseract.image_to_string(Image.open(src))
print(email)