Question

我用scrapy和selenium爬行！我的网站使用ajax进行分页！ actully，url没有变化，所以response.body也没有变化！我想点击selenium（用于分页）并获取self.driver.page_source并使用它而不是response.body！所以我写了这段代码：

 res = scrapy.http.TextResponse(url=self.driver.current_url, body=self.driver.page_source,
                                           encoding='utf-8')
            print(str(res)) //nothing to print!

            for quote in res.css("#ctl00_ContentPlaceHolder1_Grd_Dr_DXMainTable > tr.dxgvDataRow_Office2003Blue"):
                i = i+1
                item = dict()


                item['id'] =  int(quote.css("td.dxgv:nth-child(1)::text").extract_first())

没有错误！

Answer 1

您可以使用response.replace()方法替换scrapy中原始响应的主体：

def parse(self, response):
    response = response.replace(body=driver.page_source)

更新scrapy中的response.body（无需重新加载）

1 个答案: