我想抓取一个网页freeproxylists.net但是却出现了以下错误:
Traceback (most recent call last):
File "test.py", line 17, in <module>
r = requests.get("http://www.freeproxylists.net/zh/?c=&pt=&pr=HTTPS&a%5B%5D=1&a%5B%5D=2&u=60", headers=headers)
File "/usr/local/lib/python2.7/site-packages/requests/api.py", line 69, in get
return request('get', url, params=params, **kwargs)
File "/usr/local/lib/python2.7/site-packages/requests/api.py", line 50, in request
response = session.request(method=method, url=url, **kwargs)
File "/usr/local/lib/python2.7/site-packages/requests/sessions.py", line 468, in request
resp = self.send(prep, **send_kwargs)
File "/usr/local/lib/python2.7/site-packages/requests/sessions.py", line 576, in send
r = adapter.send(request, **kwargs)
File "/usr/local/lib/python2.7/site-packages/requests/adapters.py", line 412, in send
raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', error(104, 'Connection reset by peer'))
我在请求中添加标题,因为某些答案在某些类似的问题中提示但它不起作用。一些答案说这是服务器的问题,但可以在浏览器中访问网页。我认为这个网站必须有不同的东西,因为代码在其他一些网站上运行良好。
以下是导致问题的简化代码
headers = {
"Accept": "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",
"Host": "www.freeproxylists.net",
"Referer": "http://www.freeproxylists.net",
"Upgrade-Insecure-Requests": 1,
"User-Agent" : "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36",
"Accept-Encoding": "gzip, deflate, sdch",
"Accept-Language": "zh-CN,zh;q=0.8,en;q=0.6"
}
r = requests.get("http://www.freeproxylists.net/zh/?c=&pt=&pr=HTTPS&a%5B%5D=1&a%5B%5D=2&u=60", headers=headers)