Question

我在这里放置HTML代码：

<div class="rendering rendering_person rendering_short rendering_person_short">
  <h3 class="title">
    <a rel="Person" href="https://moh-it.pure.elsevier.com/en/persons/massimo-eraldo-abate" class="link person"><span>Massimo Eraldo Abate</span></a>
  </h3>
  <ul class="relations email">
    <li class="email"><a href="massimo.abate@ior.it" class="link"><span>massimo.abate@ior.it</span></a></li>
  </ul>
  <p class="type"><span class="family">Person: </span>Academic</p>
</div>

从上面的代码中如何提取Massimo Eraldo Abate？

请帮帮我。

Answer 1

您可以使用

提取名称

response.xpath('//h3[@class="title"]/a/span/text()').extract_first()

另外，请查看此Scrapinghub的blogpost以了解XPath。

Answer 2

请看一下这个页面。有很多方法可以提取文本 scrapy docs

>>> body = '<html><body><span>good</span></body></html>'
>>> Selector(text=body).xpath('//span/text()').extract()

>>> response = HtmlResponse(url='http://example.com', body=body)
>>> Selector(response=response).xpath('//span/text()').extract()

如何使用scrapy从python中获取文本？

2 个答案: