此外,我将在数据中查看数据:
Academic Title --------- Clinical Fellow in Surgery
Department --------- Surgery-Brigham and Women's Hospital
Institution--------- Brigham and Women's Hospital
Address --------- Brigham and Womens Hospital
--------- c/o Surgery Education
--------- 75 Francis St
--------- Boston, MA 02115
Phone --------- 617/732-6861
Email --------- email as image
我在这里放置代码:在最后一个div中如何从图像中提取电子邮件文本......在网站电子邮件中显示为图像,无法复制或点击。请告诉我如何在scrapy中的python 2.7.13中提取它?
<div class="person-line">
<span>Department</span>
<div>Surgery-Brigham and Women's Hospital</div>
</div>
<div class="person-line">
<span>Institution</span>
<div>Brigham and Women's Hospital</div>
</div>
<div class="person-line">
<span>Address</span>
<div>
Brigham and Womens Hospital<br/> c/o Surgery Education<br/> 75 Francis St <br/> Boston, MA 02115<br/>
</div>
</div>
<div class="person-line">
<span>Phone</span>
<div>617/732-6861</div>
</div>
<div class="person-line">
<span>Email</span>
<div>
<img src="/sites/default/files/hms-faculty-emails/BX0UVXkP.jpg" />
</div>
</div>
答案 0 :(得分:0)
首先你应该得到src:
src = response.css('div.person-line > div > img::attr("src")').extract_first()
然后您可以使用pytesseract
从图像中获取文字import pytesseract
from PIL import Image,
email = pytesseract.image_to_string(Image.open(src))
print(email)