在我的应用程序中,用户输入一个网址,然后尝试打开该链接并获取该网页的标题。但我意识到可能存在许多不同类型的错误,包括标题中的unicode字符或换行符以及AttributeError
和IOError
。我首先尝试捕获每个错误,但现在如果出现url fetch错误,我想重定向到错误页面,用户将手动输入标题。如何捕获所有可能的错误?这是我现在的代码:
title = "title"
try:
soup = BeautifulSoup.BeautifulSoup(urllib.urlopen(url))
title = str(soup.html.head.title.string)
if title == "404 Not Found":
self.redirect("/urlparseerror")
elif title == "403 - Forbidden":
self.redirect("/urlparseerror")
else:
title = str(soup.html.head.title.string).lstrip("\r\n").rstrip("\r\n")
except UnicodeDecodeError:
self.redirect("/urlparseerror?error=UnicodeDecodeError")
except AttributeError:
self.redirect("/urlparseerror?error=AttributeError")
#https url:
except IOError:
self.redirect("/urlparseerror?error=IOError")
#I tried this else clause to catch any other error
#but it does not work
#this is executed when none of the errors above is true:
#
#else:
# self.redirect("/urlparseerror?error=some-unknown-error-caught-by-else")
更新
正如@Wooble在评论中所建议的,我在向数据库写try...except
时添加了title
:
try:
new_item = Main(
....
title = unicode(title, "utf-8"))
new_item.put()
except UnicodeDecodeError:
self.redirect("/urlparseerror?error=UnicodeDecodeError")
这很有效。虽然根据日志信息,超出范围的字符—
仍在title
中:
***title: 7.2. re — Regular expression operations — Python v2.7.1 documentation**
你知道为什么吗?
答案 0 :(得分:2)
您可以使用except而不指定任何类型来捕获所有异常。
来自python docs http://docs.python.org/tutorial/errors.html:
import sys
try:
f = open('myfile.txt')
s = f.readline()
i = int(s.strip())
except IOError as (errno, strerror):
print "I/O error({0}): {1}".format(errno, strerror)
except ValueError:
print "Could not convert data to an integer."
except:
print "Unexpected error:", sys.exc_info()[0]
raise
最后一个将捕获之前未捕获的任何异常(即不是IOError或ValueError的异常。)
答案 1 :(得分:2)
您可以使用顶级异常类型Exception,它将捕获之前未捕获的任何异常。
http://docs.python.org/library/exceptions.html#exception-hierarchy
try:
soup = BeautifulSoup.BeautifulSoup(urllib.urlopen(url))
title = str(soup.html.head.title.string)
if title == "404 Not Found":
self.redirect("/urlparseerror")
elif title == "403 - Forbidden":
self.redirect("/urlparseerror")
else:
title = str(soup.html.head.title.string).lstrip("\r\n").rstrip("\r\n")
except UnicodeDecodeError:
self.redirect("/urlparseerror?error=UnicodeDecodeError")
except AttributeError:
self.redirect("/urlparseerror?error=AttributeError")
#https url:
except IOError:
self.redirect("/urlparseerror?error=IOError")
except Exception, ex:
print "Exception caught: %s" % ex.__class__.__name__