Python / scrapy - response.replace()方法不起作用?

时间:2017-06-15 13:13:21

标签: python scrapy

在调用yield请求之前尝试使用response.replace更改response.url时,我得到相同的结果?语法似乎是正确的强硬。

print(response.url)
response.replace(url='https://techcrunch.com/search/heartbleed#stq=heartbleed&stp=2')
print(response.url)  

next = self.driver.find_element(By.XPATH,"//a[@class='page-link next']")  
nextpage = next.get_attribute("href")  
yield scrapy.Request(url=nextpage, dont_filter=False)

注意:
1.我分配两次网址(如果它可以工作,则不需要。)grrr 2.nextpage是与代码的第2行完全相同的URL

输出:

https://techcrunch.com/search/heartbleed
https://techcrunch.com/search/heartbleed
2017-06-15 15:09:55 [selenium.webdriver.remote.remote_connection] DEBUG: POST http://127.0.0.1:56740/wd/hub/session/e3ba0740-51cb-11e7-acb6-f1825cec3f42/element {"using": "xpath", "sessionId": "e3ba0740-51cb-11e7-acb6-f1825cec3f42", "value": "//a[@class='page-link next']"}
2017-06-15 15:09:55 [selenium.webdriver.remote.remote_connection] DEBUG: Finished Request
2017-06-15 15:09:55 [selenium.webdriver.remote.remote_connection] DEBUG: GET http://127.0.0.1:56740/wd/hub/session/e3ba0740-51cb-11e7-acb6-f1825cec3f42/element/:wdc:1497532195411/attribute/href {"sessionId": "e3ba0740-51cb-11e7-acb6-f1825cec3f42", "name": "href", "id": ":wdc:1497532195411"}
2017-06-15 15:09:55 [selenium.webdriver.remote.remote_connection] DEBUG: Finished Request  

我觉得这就是为什么我不能去其他链接的原因,因为响应总是停留在同一个网站上,而不是关注新的链接

1 个答案:

答案 0 :(得分:3)

我猜替换方法不会执行操作但返回结果:

replace([url, status, headers, body, request, flags, cls])
Returns a Response object with the same members, except for those members given new values by whichever keyword arguments are specified.

所以我会尝试这样的事情:

new_response = response.replace(whatever=value)