选择基于正则表达式调用的Python函数

时间:2011-07-08 20:05:00

标签: python anonymous-function lambda

是否可以将函数放入数据结构中,而不先使用def为其命名?

# This is the behaviour I want. Prints "hi".
def myprint(msg):
    print msg
f_list = [ myprint ]
f_list[0]('hi')
# The word "myprint" is never used again. Why litter the namespace with it?

lambda函数的主体受到严格限制,所以我不能使用它们。

编辑:作为参考,这更像是我遇到问题的真实代码。

def handle_message( msg ):
    print msg
def handle_warning( msg ):
    global num_warnings, num_fatals
    num_warnings += 1
    if ( is_fatal( msg ) ):
        num_fatals += 1
handlers = (
    ( re.compile( '^<\w+> (.*)' ), handle_message ),
    ( re.compile( '^\*{3} (.*)' ), handle_warning ),
)
# There are really 10 or so handlers, of similar length.
# The regexps are uncomfortably separated from the handler bodies,
# and the code is unnecessarily long.

for line in open( "log" ):
    for ( regex, handler ) in handlers:
        m = regex.search( line )
        if ( m ): handler( m.group(1) )

14 个答案:

答案 0 :(得分:39)

这基于Udi's nice answer

我认为创建匿名函数的难度有点像红色鲱鱼。你真正想做的是将相关代码保持在一起,并使代码整洁。所以我认为装饰师可能适合你。

import re

# List of pairs (regexp, handler)
handlers = []

def handler_for(regexp):
    """Declare a function as handler for a regular expression."""
    def gethandler(f):
        handlers.append((re.compile(regexp), f))
        return f
    return gethandler

@handler_for(r'^<\w+> (.*)')
def handle_message(msg):
    print msg

@handler_for(r'^\*{3} (.*)')
def handle_warning(msg):
    global num_warnings, num_fatals
    num_warnings += 1
    if is_fatal(msg):
        num_fatals += 1

答案 1 :(得分:16)

Nicer DRY解决实际问题的方法:

def message(msg):
    print msg
message.re = '^<\w+> (.*)'

def warning(msg):
    global num_warnings, num_fatals
    num_warnings += 1
    if ( is_fatal( msg ) ):
        num_fatals += 1
warning.re = '^\*{3} (.*)'

handlers = [(re.compile(x.re), x) for x in [
        message,
        warning,
        foo,
        bar,
        baz,
    ]]

答案 2 :(得分:14)

使用模块化自包含解决方案继续Gareth's干净的方法:

import re

# in util.py
class GenericLogProcessor(object):

    def __init__(self):
      self.handlers = [] # List of pairs (regexp, handler)

    def register(self, regexp):
        """Declare a function as handler for a regular expression."""
        def gethandler(f):
            self.handlers.append((re.compile(regexp), f))
            return f
        return gethandler

    def process(self, file):
        """Process a file line by line and execute all handlers by registered regular expressions"""
        for line in file:
            for regex, handler in self.handlers:
                m = regex.search(line)
                if (m):
                  handler(m.group(1))      

# in log_processor.py
log_processor = GenericLogProcessor()

@log_processor.register(r'^<\w+> (.*)')
def handle_message(msg):
    print msg

@log_processor.register(r'^\*{3} (.*)')
def handle_warning(msg):
    global num_warnings, num_fatals
    num_warnings += 1
    if is_fatal(msg):
        num_fatals += 1

# in your code
with open("1.log") as f:
  log_processor.process(f)

答案 3 :(得分:13)

如果要保留一个干净的命名空间,请使用del:

def myprint(msg):
    print msg
f_list = [ myprint ]
del myprint
f_list[0]('hi')

答案 4 :(得分:9)

如你所说,这是不可能做到的。但你可以近似它。

def create_printer():
  def myprint(x):
    print x
  return myprint

x = create_printer()

myprint在这里实际上是匿名的,因为调用者无法再访问创建它的变量范围。 (见closures in Python。)

答案 5 :(得分:6)

如果您担心污染命名空间,请在另一个函数内创建函数。那么你只是“污染”create_functions函数的本地命名空间而不是外部命名空间。

def create_functions():
    def myprint(msg):
        print msg
    return [myprint]

f_list = create_functions()
f_list[0]('hi')

答案 6 :(得分:4)

你不应该这样做因为eval是邪恶的,但你可以使用FunctionTypecompile在运行时编译功能代码:

>>> def f(msg): print msg
>>> type(f)
 <type 'function'>
>>> help(type(f))
...
class function(object)
 |  function(code, globals[, name[, argdefs[, closure]]])
 |
 |  Create a function object from a code object and a dictionary.
 |  The optional name string overrides the name from the code object.
 |  The optional argdefs tuple specifies the default argument values.
 |  The optional closure tuple supplies the bindings for free variables.    
...

>>> help(compile)
Help on built-in function compile in module __builtin__:

compile(...)
    compile(source, filename, mode[, flags[, dont_inherit]]) -> code object

    Compile the source string (a Python module, statement or expression)
    into a code object that can be executed by the exec statement or eval().
    The filename will be used for run-time error messages.
    The mode must be 'exec' to compile a module, 'single' to compile a
    single (interactive) statement, or 'eval' to compile an expression.
    The flags argument, if present, controls which future statements influence
    the compilation of the code.
    The dont_inherit argument, if non-zero, stops the compilation inheriting
    the effects of any future statements in effect in the code calling
    compile; if absent or zero these statements do influence the compilation,
    in addition to any features explicitly specified.

答案 7 :(得分:3)

正如所有人说的lambda是唯一的方法,但你不得不考虑lambda限制但是如何避免它们 - 例如你可以使用列表,dicts,comprehension等来做你想做的事情:

funcs = [lambda x,y: x+y, lambda x,y: x-y, lambda x,y: x*y, lambda x: x]
funcs[0](1,2)
>>> 3
funcs[1](funcs[0](1,2),funcs[0](2,2))
>>> -1
[func(x,y) for x,y in zip(xrange(10),xrange(10,20)) for func in funcs]

使用print进行编辑(尝试查看pprint module)和control-flow:

add = True
(funcs[0] if add else funcs[1])(1,2)
>>> 3

from pprint import pprint
printMsg = lambda isWarning, msg: pprint('WARNING: ' + msg) if isWarning else pprint('MSG:' + msg)

答案 8 :(得分:3)

Python真的,真的不想这样做。它不仅没有任何方法来定义多行匿名函数,而且函数定义也不返回函数,所以即使这在语法上是有效的......

mylist.sort(key=def _(v):
                    try:
                        return -v
                    except:
                        return None)

......它仍然行不通。 (虽然我猜它是否在语法上有效,但它们会使函数定义返回函数,因此它工作。)

所以你可以编写自己的函数来从一个字符串中创建一个函数(当然使用exec)并传入一个三重引用的字符串。它在语法上有点难看,但它有效:

def function(text, cache={}):

    # strip everything before the first paren in case it's "def foo(...):"
    if not text.startswith("("):
        text = text[text.index("("):]

    # keep a cache so we don't recompile the same func twice
    if text in cache:
        return cache[text]

    exec "def func" + text
    func.__name__ = "<anonymous>"

    cache[text] = func
    return func

    # never executed; forces func to be local (a tiny bit more speed)
    func = None

用法:

mylist.sort(key=function("""(v):
                                try:
                                    return -v
                                except:
                                    return None"""))

答案 9 :(得分:2)

制作匿名函数的唯一方法是使用lambda,如您所知,它们只能包含一个表达式。

您可以使用相同的名称创建许多函数,因此至少您不必考虑每个函数的新名称。

拥有真正的匿名函数会很棒,但Python的语法不能轻易支持它们。

答案 10 :(得分:2)

就我个人而言,我只是将它命名为使用它而不是担心它“徘徊”。通过使用诸如以后重新定义或使用del将名称从命名空间中删除等建议,您可以获得的唯一的事情是,如果有人后来出现并且移动了一些代码而没有动摇,则可能会出现混淆或错误你正在做什么。

答案 11 :(得分:2)

您可以使用exec

def define(arglist, body):
    g = {}
    exec("def anonfunc({0}):\n{1}".format(arglist,
                                     "\n".join("    {0}".format(line)
                                               for line in body.splitlines())), g)
    return g["anonfunc"]

f_list = [define("msg", "print(msg)")]
f_list[0]('hi')

答案 12 :(得分:1)

唯一的选择是使用lambda表达式,就像你提到的那样。没有它,就不可能。这就是python的工作方式。

答案 13 :(得分:-1)

如果你的函数很复杂到不适合lambda函数,那么,为了便于阅读,最好在正常的块中定义它。