我有两个功能很好,但是当我将它们嵌套在一起时似乎会崩溃。
def scrape_all_pages(alphabet):
pages = get_all_urls(alphabet)
for page in pages:
scrape_table(page)
我正试图系统地搜索一些搜索结果。因此get_all_pages()
会为字母表中的每个字母创建一个网址列表。有时会有数千页,但这样可以正常工作。然后,对于每个页面,scrape_table
只抓住我感兴趣的表。这也很好。我可以运行整个事情并且它工作正常,但我在Scraperwiki工作,如果我设置它运行并走开它总是给我一个“列表索引超出范围”错误。这绝对是scraperwiki中的一个问题,但我想通过在遇到问题时添加一些try/except
子句并记录错误来找到解决问题的方法。类似的东西:
def scrape_all_pages(alphabet):
try:
pages = get_all_urls(alphabet)
except:
## LOG THE ERROR IF THAT FAILS.
try:
for page in pages:
scrape_table(page)
except:
## LOG THE ERROR IF THAT FAILS
但是,我无法弄清楚如何一般地记录错误。此外,上面看起来很笨拙,根据我的经验,当一些东西看起来很笨重时,Python有更好的方法。有没有更好的办法?
答案 0 :(得分:2)
您可以指定要捕获的特定类型的异常,以及用于保存异常实例的变量:
def scrape_all_pages(alphabet):
try:
pages = get_all_urls(alphabet)
for page in pages:
scrape_table(page)
except OutOfRangeError as error:
# Will only catch OutOfRangeError
print error
except Exception as error:
# Will only catch any other exception
print error
捕获类型Exception将捕获所有错误,因为它们应该都是从Exception继承的。
这是我知道捕捉错误的唯一方法。
答案 1 :(得分:2)
将记录信息包裹在上下文管理器中,尽管您可以轻松更改详细信息以满足您的要求:
import traceback
# This is a context manager
class LogError(object):
def __init__(self, logfile, message):
self.logfile = logfile
self.message = message
def __enter__(self):
return self
def __exit__(self, type, value, tb):
if type is None or not issubclass(type, Exception):
# Allow KeyboardInterrupt and other non-standard exception to pass through
return
self.logfile.write("%s: %r\n" % (self.message, value))
traceback.print_exception(type, value, tb, file=self.logfile)
return True # "swallow" the traceback
# This is a helper class to maintain an open file object and
# a way to provide extra information to the context manager.
class ExceptionLogger(object):
def __init__(self, filename):
self.logfile = open(filename, "wa")
def __call__(self, message):
# override function() call so that I can specify a message
return LogError(self.logfile, message)
关键部分是__exit__可以返回'True',在这种情况下忽略异常,程序继续继续。代码也需要小心,因为可能会引发KeyboardInterrupt(control-C),SystemExit或其他非标准异常,并且您实际上希望程序停止。
您可以在代码中使用上述代码:
elog = ExceptionLogger("/dev/tty")
with elog("Can I divide by 0?"):
1/0
for i in range(-4, 4):
with elog("Divisor is %d" % (i,)):
print "5/%d = %d" % (i, 5/i)
该片段为我提供了输出:
Can I divide by 0?: ZeroDivisionError('integer division or modulo by zero',)
Traceback (most recent call last):
File "exception_logger.py", line 24, in <module>
1/0
ZeroDivisionError: integer division or modulo by zero
5/-4 = -2
5/-3 = -2
5/-2 = -3
5/-1 = -5
Divisor is 0: ZeroDivisionError('integer division or modulo by zero',)
Traceback (most recent call last):
File "exception_logger.py", line 28, in <module>
print "5/%d = %d" % (i, 5/i)
ZeroDivisionError: integer division or modulo by zero
5/1 = 5
5/2 = 2
5/3 = 1
我认为很容易看出人们如何修改代码来处理仅记录IndexError异常,甚至传递基本异常类型来捕获。
答案 2 :(得分:0)
也许记录每次迭代的错误,以便一次迭代中的错误不会破坏你的循环:
for page in pages:
try:
scrape_table(page)
except:
#open error log file for append:
f=open("errors.txt","a")
#write error to file:
f.write("Error occured\n") # some message specific to this iteration (page) should be added here...
#close error log file:
f.close()
答案 3 :(得分:0)
这是一个好方法,但是。您不应该只使用except
子句,您必须指定您尝试捕获的异常的类型。您还可以捕获错误并继续循环。
def scrape_all_pages(alphabet):
try:
pages = get_all_urls(alphabet)
except IndexError: #IndexError is an example
## LOG THE ERROR IF THAT FAILS.
for page in pages:
try:
scrape_table(page)
except IndexError: # IndexError is an example
## LOG THE ERROR IF THAT FAILS and continue this loop
答案 4 :(得分:0)
最好这样写:
try:
pages = get_all_urls(alphabet)
except IndexError:
## LOG THE ERROR IF THAT FAILS.
for page in pages:
try:
scrape_table(page)
except IndexError:
continue ## this will bring you to the next item in for
## LOG THE ERROR IF THAT FAILS