为了获得更好的技能,我制作了这个脚本,所以它需要一个网站列表并创建一个字典并拿走每个网站并抓取它来找到“conatct-us”页面,但我看到我的脚本停止了当其中一个网站没有工作时,我想要做的就是逃离该网站并继续其他网站 这是我的代码:
import requests
from bs4 import BeautifulSoup
from urlparse import urlparse
from mechanize import Browser
import re
headers = [('User-Agent','Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:53.0) Gecko/20100101 Firefox/53.0')]
urls = 'http://www.officialusa.com/stateguides/chambers/georgia.html'
links_dict = []
response = requests.get(urls, headers)
bsObj = BeautifulSoup(response.text,'lxml')
for tag in bsObj.find_all('li'):
links_dict.append(tag.a.get('href'))
for ink in links_dict:
r = requests.get(ink)
#get domain name only
parsed_uri = urlparse(ink)
domain = parsed_uri.netloc
br = Browser()
br.set_handle_robots(False)
br.addheaders = headers
try:
br.open(str(ink))
for link in br.links():
siteMatch = re.compile(ink).search(link.url)
print link.url
except:
pass
其他链接的一切都很好 这是错误:
Traceback (most recent call last):
File "/home/qunix/PycharmProjects/challange/crawel.py", line 20, in <module>
r = requests.get(ink)
File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 70, in get
return request('get', url, params=params, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 56, in request
return session.request(method=method, url=url, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 488, in request
resp = self.send(prep, **send_kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 609, in send
r = adapter.send(request, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/adapters.py", line 487, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='www.quitmangeorgia.org', port=80): Max retries exceeded with url: / (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7facf68cca50>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution',))
谢谢你!!
答案 0 :(得分:0)
尝试换行
r = requests.get(ink)
尝试捕捉如此:
try:
r = requests.get(ink)
except ConnectionError:
continue
这意味着如果对requests.get
的调用抛出ConnectionError,就像它在您的示例中一样,它将转移到列表中的下一个网站。