Question

我是scrapy的新手，我正在使用 Scrapy 0.14.4 。我只想按照以下示例打印标题和链接。

这是我的蜘蛛：

from scrapy.spider import BaseSpider

class XxxSpider(BaseSpider):
    name = "xxx"
    allow_domains = ["xxx.xxx.xxx"]
    start_urls = ["http://xxx.xxx.com/jobs/"]


    def parse(self, response):
        for sel in response.xpath("//div[@id='job_listings']/a"):
            title = sel.xpath('./text()').extract()
            link = sel.xpath('./@href').extract()
            print title, link

这有什么缺失？

Answer 1

问题是您使用的是旧版本的Scrapy，其中选择器未包含在响应对象中。要验证这一点，请查看相关文档：http://doc.scrapy.org/en/0.14/topics/request-response.html

要解决您的问题，请将响应包装到选择器中，然后您可以使用选择器上的xpath函数：

from scrapy.selector import HtmlXPathSelector 
def parse(self, response):
    hxs = HtmlXPathSelector(response)
    for sel in hxs.select("//div[@id='job_listings']/a"):

我收到了一个AttributeError：＆＃39; HtmlResponse＆＃39;对象没有属性＆＃39; xpath＆＃39;在scrapy中

1 个答案: