那,我怀疑我的用例很好......
以下是我的一个项目中的一些代码,它从PDF文件中提取文本。处理涉及三个步骤:
我最近了解了上下文管理器和with
语句,这对他们来说似乎是一个很好的用例。因此,我开始定义PDFMinerWrapper
类:
class PDFMinerWrapper(object):
'''
Usage:
with PDFWrapper('/path/to/file.pdf') as doc:
doc.dosomething()
'''
def __init__(self, pdf_doc, pdf_pwd=''):
self.pdf_doc = pdf_doc
self.pdf_pwd = pdf_pwd
def __enter__(self):
self.pdf = open(self.pdf_doc, 'rb')
parser = PDFParser(self.pdf) # create a parser object associated with the file object
doc = PDFDocument() # create a PDFDocument object that stores the document structure
parser.set_document(doc) # connect the parser and document objects
doc.set_parser(parser)
doc.initialize(self.pdf_pwd) # pass '' if no password required
return doc
def __exit__(self, type, value, traceback):
self.pdf.close()
# if we have an error, catch it, log it, and return the info
if isinstance(value, Exception):
self.logError()
print traceback
return value
现在我可以轻松使用PDF文件,并确保它能够优雅地处理错误。理论上,我需要做的就是这样:
with PDFMinerWrapper('/path/to/pdf') as doc:
foo(doc)
这很好,除了我需要检查PDF文档是否可以在之前>>将函数应用于PDFMinerWrapper
返回的对象。我目前的解决方案涉及一个中间步骤。
我正在使用我称为Pamplemousse
的类,该类用作处理PDF的界面。反过来,每次必须对对象所链接的文件执行操作时,它都会使用PDFMinerWrapper
。
以下是一些演示其用法的(删节)代码:
class Pamplemousse(object):
def __init__(self, inputfile, passwd='', enc='utf-8'):
self.pdf_doc = inputfile
self.passwd = passwd
self.enc = enc
def with_pdf(self, fn, *args):
result = None
with PDFMinerWrapper(self.pdf_doc, self.passwd) as doc:
if doc.is_extractable: # This is the test I need to perform
# apply function and return result
result = fn(doc, *args)
return result
def _parse_toc(self, doc):
toc = []
try:
toc = [(level, title) for level, title, dest, a, se in doc.get_outlines()]
except PDFNoOutlines:
pass
return toc
def get_toc(self):
return self.with_pdf(self._parse_toc)
每当我希望对PDF文件执行操作时,我都会将相关函数及其参数传递给with_pdf
方法。反过来,with_pdf
方法使用with
语句来利用PDFMinerWrapper
的上下文管理器(从而确保正常处理异常)并在实际应用函数之前执行检查。过去了。
我的问题如下:
我想简化此代码,以便我不必显式调用Pamplemousse.with_pdf
。我的理解是装饰者可以在这里提供帮助,所以:
with
语句并执行可提取性检查?答案 0 :(得分:1)
我解释你的目标的方式是能够在你的Pamplemousse
类上定义多个方法,而不是经常在那个调用中包装它们。这是一个非常简化的版本:
def if_extractable(fn):
# this expects to be wrapping a Pamplemousse object
def wrapped(self, *args):
print "wrapper(): Calling %s with" % fn, args
result = None
with PDFMinerWrapper(self.pdf_doc) as doc:
if doc.is_extractable:
result = fn(self, doc, *args)
return result
return wrapped
class Pamplemousse(object):
def __init__(self, inputfile):
self.pdf_doc = inputfile
# get_toc will only get called if the wrapper check
# passes the extractable test
@if_extractable
def get_toc(self, doc, *args):
print "get_toc():", self, doc, args
定义的装饰器if_extractable
只是一个函数,但它希望在类的实例方法中使用。
用于委托私有方法的装饰get_toc
,如果通过检查,只会期望接收doc
对象和args。否则它不会被调用,包装器返回None。
通过这种方式,您可以继续定义操作函数以期望doc
你甚至可以添加一些类型检查以确保它包装预期的类:
def if_extractable(fn):
def wrapped(self, *args):
if not hasattr(self, 'pdf_doc'):
raise TypeError('if_extractable() is wrapping '\
'a non-Pamplemousse object')
...
答案 1 :(得分:0)
装饰器只是一个函数,它接受一个函数并返回另一个函数。你可以做任何你喜欢的事情:
def my_func():
return 'banana'
def my_decorator(f): # see it takes a function as an argument
def wrapped():
res = None
with PDFMineWrapper(pdf_doc, passwd) as doc:
res = f()
return res
return wrapper # see, I return a function that also calls f
现在,如果您应用装饰器:
@my_decorator
def my_func():
return 'banana'
wrapped
函数将替换my_func
,因此将调用额外的代码。
答案 2 :(得分:0)
您可能想尝试这样做:
def with_pdf(self, fn, *args):
def wrappedfunc(*args):
result = None
with PDFMinerWrapper(self.pdf_doc, self.passwd) as doc:
if doc.is_extractable: # This is the test I need to perform
# apply function and return result
result = fn(doc, *args)
return result
return wrappedfunc
当你需要包装函数时,只需执行以下操作:
@pamplemousseinstance.with_pdf
def foo(doc, *args):
print 'I am doing stuff with', doc
print 'I also got some good args. Take a look!', args
答案 3 :(得分:0)
以下是一些演示代码:
#! /usr/bin/python
class Doc(object):
"""Dummy PDFParser Object"""
is_extractable = True
text = ''
class PDFMinerWrapper(object):
'''
Usage:
with PDFWrapper('/path/to/file.pdf') as doc:
doc.dosomething()
'''
def __init__(self, pdf_doc, pdf_pwd=''):
self.pdf_doc = pdf_doc
self.pdf_pwd = pdf_pwd
def __enter__(self):
return self.pdf_doc
def __exit__(self, type, value, traceback):
pass
def safe_with_pdf(fn):
"""
This is the decorator, it gets passed the fn we want
to decorate.
However as it is also a class method it also get passed
the class. This appears as the first argument and the
function as the second argument.
"""
print "---- Decorator ----"
print "safe_with_pdf: First arg (fn):", fn
def wrapper(self, *args, **kargs):
"""
This will get passed the functions arguments and kargs,
which means that we can intercept them here.
"""
print "--- We are now in the wrapper ---"
print "wrapper: First arg (self):", self
print "wrapper: Other args (*args):", args
print "wrapper: Other kargs (**kargs):", kargs
# This function is accessible because this function is
# a closure, thus still has access to the decorators
# ivars.
print "wrapper: The function we run (fn):", fn
# This wrapper is now pretending to be the original function
# Perform all the checks and stuff
with PDFMinerWrapper(self.pdf, self.passwd) as doc:
if doc.is_extractable:
# Now call the orininal function with its
# argument and pass it the doc
result = fn(doc, *args, **kargs)
else:
result = None
print "--- End of the Wrapper ---"
return result
# Decorators are expected to return a function, this
# function is then run instead of the decorated function.
# So instead of returning the original function we return the
# wrapper. The wrapper will be run with the original functions
# argument.
# Now by using closures we can still access the original
# functions by looking up fn (the argument that was passed
# to this function) inside of the wrapper.
print "--- Decorator ---"
return wrapper
class SomeKlass(object):
@safe_with_pdf
def pdf_thing(doc, some_argument):
print ''
print "-- The Function --"
# This function is now passed the doc from the wrapper.
print 'The contents of the pdf:', doc.text
print 'some_argument', some_argument
print "-- End of the Function --"
print ''
doc = Doc()
doc.text = 'PDF contents'
klass = SomeKlass()
klass.pdf = doc
klass.passwd = ''
klass.pdf_thing('arg')
我建议运行该代码以查看其工作原理。一些有趣的观点要注意:
首先你会注意到我们只将一个参数传递给pdf_thing()
,但是如果你看一下这个方法就需要两个参数:
@safe_with_pdf
def pdf_thing(doc, some_argument):
print ''
print "-- The Function --"
这是因为如果你看一下我们所有函数的包装器:
with PDFMinerWrapper(self.pdf, self.passwd) as doc:
if doc.is_extractable:
# Now call the orininal function with its
# argument and pass it the doc
result = fn(doc, *args, **kargs)
我们生成doc参数并将其与原始参数(*args, **kargs
)一起传递。这意味着除了声明(doc
)中列出的参数之外,使用此装饰器包装的每个方法或函数都会收到一个加法def pdf_thing(doc, some_argument):
参数。
需要注意的另一件事是包装器:
def wrapper(self, *args, **kargs):
"""
This will get passed the functions arguments and kargs,
which means that we can intercept them here.
"""
还捕获self
参数,但不将其传递给被调用的方法。您可以通过以下方式更改此行为:
result = fn(doc, *args, **kargs)
else:
result = None
要:
result = fn(self, doc, *args, **kargs)
else:
result = None
然后将方法本身更改为:
def pdf_thing(self, doc, some_argument):
希望有所帮助,请随时要求更多澄清。
编辑:
回答你问题的第二部分。
是的可以是一种类方法。只需将safe_with_pdf
放在SomeKlass
上方内并调用它,例如班上的第一个方法。
此处还有上述代码的简化版本,以及类中的装饰器。
class SomeKlass(object):
def safe_with_pdf(fn):
"""The decorator which will wrap the method"""
def wrapper(self, *args, **kargs):
"""The wrapper which will call the method is a doc"""
with PDFMinerWrapper(self.pdf, self.passwd) as doc:
if doc.is_extractable:
result = fn(doc, *args, **kargs)
else:
result = None
return result
return wrapper
@safe_with_pdf
def pdf_thing(doc, some_argument):
"""The method to decorate"""
print 'The contents of the pdf:', doc.text
print 'some_argument', some_argument
return '%s - Result' % doc.text
print klass.pdf_thing('arg')