Question

我正在抓一个结构简单的页面，我使用chrome来了解我需要使用的xpath，但是在这种情况下不起作用。

我有这种xpath：

/html/body/text()[1]

/html/body/div[9]/p/span[2]/text()

但是当我尝试时：

response.xpath('/html/body/div[9]/p/span[2]/text()')

或

response.xpath('/html/body/div[9]/p/span[2]/text()').extract()

我没有得到任何回复，只是一个空列表

Answer 1

您需要修复XPath表达式。来自壳牌的演示：

$ scrapy shell "http://www.bbb.org/boston/business-reviews/appliances-major-dealers/dracut-appliance-center-inc-in-dracut-ma-76793/ReadReviews?page=1&exp=1" -s USER_AGENT="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.116 Safari/537.36"
>>> print(response.xpath("//span[. = 'Comment from the Business']/following-sibling::span/text()").extract_first())
Mr, ********,
Thank you very much for your positive review.  It's great to hear your install went smoothly. *** (our sales manager of over 45 years) and *** (Sales for over 10 years) have been notified of this positive response and truly appreciated it.  We look forward to service you again in the future!!

在简单的页面中使用scrapy

1 个答案: