在我的scrapy抓取工具中,我有以下custome_settings:
custom_settings = {
'DEFAULT_REQUEST_HEADERS': {
'Connection' : 'Keep-Alive'
}
}
我甚至尝试在scrapy.Reqest()中设置标题如下:
headers = {}
headers['Connection'] : 'Keep-Alive'
headers['User-Agent'] : 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36'
for url in urls:
yield scrapy.Request(url, headers=headers, callback = self.parse)
但请求不会覆盖该值。相反,它只是追加另一个"连接"进入请求标头。
请求包:
GET http://example.com HTTP/1.1
Connection: close
Connection: Keep-Alive
User-Agent: Scrapy/1.4.0 (+http://scrapy.org)
Accept-Encoding: gzip,deflate
但请求"连接:保持活跃"被附加到请求标头而不是覆盖它。我如何实际覆盖"连接"在scrapy中请求的标题?