我得到一个urllib2
的网页,然后用lxml
解析。通常有两件事可能出错:urllib2.URLError
或httplib.IncompleteRead
def get_page(url):
response = None
while not response:
try:
response = urllib2.urlopen(url)
except urllib2.URLError:
response = urllib2.urlopen(url)
except httplib.IncompleteRead:
print '**** IncompleteRead for response from %s, retrying' % url
html_parser = etree.HTMLParser()
tree = etree.parse(response, html_parser)
return tree
这里有一些明显的问题:
except
与之前的try
完全相同。lxml
,都会尝试使用response
进行解析。所以:
except
需要做什么? pass
可以接受吗? try
内只应该尝试一个动作,所以我不愿意在那里移动解析。实际上,函数本身应该只执行一个动作 - 然后解析是否属于它自己的函数?答案 0 :(得分:3)
您可以使用continue
和break
语句的组合来处理这些情况。 continue
将跳回while循环的顶部,break
将跳出while循环。
def get_page(url):
response = None
while not response:
try:
response = urllib2.urlopen(url)
except urllib2.URLError:
continue # No response, try again
except httplib.IncompleteRead:
print '**** IncompleteRead for response from %s, retrying' % url
break # Bad response, don't try again?
html_parser = etree.HTMLParser()
tree = etree.parse(response, html_parser)
return tree
还可以在这里混合使用其他流控制工具(例如else
的{{1}}子句,仅当例外不时才会执行发生在街区):
try
而不是:
try:
pass
except Exception as err:
print("Don't see this.")
else:
print("You will see this.")
答案 1 :(得分:1)
我认为您希望将解析移出while
循环,而不是移动到try
块。这样,您可以继续尝试获取有效响应的循环,并且只在请求成功时尝试解析。
def get_page(url):
response = None
while not response:
try:
response = urllib2.urlopen(url)
except urllib2.URLError:
print '**** URLError for response from %s, retrying' % url
except httplib.IncompleteRead:
print '**** IncompleteRead for response from %s, retrying' % url
html_parser = etree.HTMLParser()
tree = etree.parse(response, html_parser)
return tree
我还更新了except
的{{1}}块,使其与URLError
块基本相同。我真的不确定这是否合适,因为某些IncompeleteRead
可能无法通过重试来修复(例如,如果服务器不存在,则可能无法改变而你正在重试)。如果它应该是致命错误(至少对此函数致命),您可能希望在URLError
块中raise
,而不是让循环继续。这是一个比except
更认真地对待URLErrors
的版本:
IncompleteRead
def get_page(url):
response = None
while not response:
try:
response = urllib2.urlopen(url)
except urllib2.URLError:
print '**** URLError for response from %s, giving up' % url
raise
except httplib.IncompleteRead:
print '**** IncompleteRead for response from %s, retrying' % url
html_parser = etree.HTMLParser()
tree = etree.parse(response, html_parser)
return tree
关键字本身(后面没有表达式)会重新引发当前异常。如果在您的应用程序中更有意义(例如raise
,则表示提供的URL不合适),您也可以引发其他错误。