使用请求python错误找出存在的URL

时间:2015-11-05 23:45:56

标签: python url python-3.x python-requests

我试图找出是否存在一组网址,或者他们是否会在不必完成所有网址的情况下回复错误。我使用的是python 3.5.0。基本URl是http://www5.registraduria.gov.co/CuentasClarasPublicoCon2014/Consultas/Candidato/Reporte/2,它通过在末尾添加一个简单数字(从0到最多10000)来改变。我尝试了以下方法:

import requests, os, bs4


url = 'http://www5.registraduria.gov.co/CuentasClarasPublicoCon2014/Consultas/Candidato/Reporte/'              
start = 0 #Start the count, 
urllook = url+str(start) #Add the count to the url
res = requests.get(urllook)#Talk to the page
goodid = []#create empty array of candidate id

while start<10000: #It seems like there are no candidates more than 5000, just to be sure made it big
    res = requests.get(urllook) #Talk to page again
    if res.ok: #If no error
        start=start+1 #increase count by one
        print(start) #what page I'm at
        goodid.append(start) #Add to the goodid array
        urllook=url+str(start) #Increase the URL by one
    else: #If error then
        start=start+1 #Increase count by one
        print(start) #what page I'm at
        urllook=url+str(start) #Increase the URL by one
print("LOL")

这适用于少量网址,例如从0到100,但我想确保我拥有所有网址。我希望将goodid对象保存在我以后可以访问的.txt文件中,但似乎有一个错误我无法在一段时间之后在随机网址中弄清楚。这是错误:

Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/requests/packages/urllib3/connectionpool.py", line 376, in _make_request
    httplib_response = conn.getresponse(buffering=True)
TypeError: getresponse() got an unexpected keyword argument 'buffering'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/requests/packages/urllib3/connectionpool.py", line 559, in urlopen
    body=body, headers=headers)
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/requests/packages/urllib3/connectionpool.py", line 378, in _make_request
    httplib_response = conn.getresponse()
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/http/client.py", line 1174, in getresponse
    response.begin()
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/http/client.py", line 282, in begin
    version, status, reason = self._read_status()
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/http/client.py", line 243, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/socket.py", line 571, in readinto
    return self._sock.recv_into(b)
ConnectionResetError: [Errno 54] Connection reset by peer

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/requests/adapters.py", line 370, in send
    timeout=timeout
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/requests/packages/urllib3/connectionpool.py", line 609, in urlopen
    _stacktrace=sys.exc_info()[2])
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/requests/packages/urllib3/util/retry.py", line 245, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/requests/packages/urllib3/packages/six.py", line 309, in reraise
    raise value.with_traceback(tb)
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/requests/packages/urllib3/connectionpool.py", line 559, in urlopen
    body=body, headers=headers)
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/requests/packages/urllib3/connectionpool.py", line 378, in _make_request
    httplib_response = conn.getresponse()
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/http/client.py", line 1174, in getresponse
    response.begin()
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/http/client.py", line 282, in begin
    version, status, reason = self._read_status()
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/http/client.py", line 243, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/socket.py", line 571, in readinto
    return self._sock.recv_into(b)
requests.packages.urllib3.exceptions.ProtocolError: ('Connection aborted.', ConnectionResetError(54, 'Connection reset by peer'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/Gaborio/Dropbox/Fall 2015/Special Interest/Final Paper/Pythondata/thebigone.py", line 15, in <module>
    res = requests.get(urllook) #Talk to page again
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/requests/api.py", line 69, in get
    return request('get', url, params=params, **kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/requests/api.py", line 50, in request
    response = session.request(method=method, url=url, **kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/requests/sessions.py", line 468, in request
    resp = self.send(prep, **send_kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/requests/sessions.py", line 576, in send
    r = adapter.send(request, **kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/requests/adapters.py", line 412, in send
    raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', ConnectionResetError(54, 'Connection reset by peer'))

似乎主要的问题是它返回一个缓冲错误,但是它在不同的URL处这样做,它发生在/ 197,在273处的另一个运行中,最后一个在692处。如何解决此错误?这是什么意思?

不相关,但如果有人对此做出任何建议我更欢迎他们,我对python很新,而不是一般的编程专家。

编辑:我现在明白,通过对等方重置连接意味着服务器关闭了连接,但我仍然不明白为什么,特别是我不明白为什么它会在随机URL中发生

1 个答案:

答案 0 :(得分:0)

两条建议:

在请求之间使用time.sleephttps://docs.python.org/3.0/library/time.html#time.sleep

对错误使用try / excepthttps://docs.python.org/3/tutorial/errors.html#handling-exceptions