我有一个包含数千个链接的大文件。我已经编写了一个脚本逐行调用每个链接并在相应的网页上执行各种分析。但是,有时链接有问题(文章从网站上删除等),我的整个脚本就此停止了。
有没有办法规避这个问题?这是我的(伪)代码:
for row in file:
url = row[4]
req=urllib2.Request(url)
tree = lxml.html.fromstring(urllib2.urlopen(req).read())
perform analyses
append analyses results to lists
output data
我试过了
except:
pass
但是出于某种原因,它可能会使剧本变得混乱。
答案 0 :(得分:2)
适合我:
for row in file:
url = row[4]
try:
req=urllib2.Request(url)
tree = lxml.html.fromstring(urllib2.urlopen(req).read())
perform analyses
append analyses results to lists
except URLError, e:
pass
output data
答案 1 :(得分:0)
尝试阻止是要走的路:
for row in file:
url = row[4]
try:
req=urllib2.Request(url)
tree = lxml.html.fromstring(urllib2.urlopen(req).read())
except URLError, e:
continue
perform analyses
append analyses results to lists
output data
继续将允许您在URL检查后跳过任何不必要的计算,并在循环的下一次迭代中重新启动