Question

我要在此处抓取网址：

我尝试了这些：

response.xpath('//header[@class="geodir-entry-header"]/a/@href').extract()

response.xpath('//div[class="geodir-content "]/header/a/@href').extract()

response.xpath('//div[@class="geodir-content "]/header[@class="geodir-entry-header"]/a/@href').extract()

全部返回一个空列表。

Answer 1

是

response.xpath('//h3[@class="geodir-entry-title"]/a/@href').extract() 要么 response.xpath('//header[@class="geodir-entry-header"]/h3/a/@href').extract()

为您工作？

您好像错过了包含您需要的h3标签的a标签。

Answer 2

您所需要的只是添加您意外丢失的h3标签。

response.xpath('//header[@class="geodir-entry-header"]/h3/a/@href').extract()

＆如果您只想获取第一个网址，则添加

response.xpath('//header[@class="geodir-entry-header"]/h3/a/@href').extract_first()

或

response.xpath('//header[@class="geodir-entry-header"]/h3/a/@href').extract()[0]

Scrapy + Xpath + Python：无法抓取数据点

2 个答案: