Question

我正在解析文件，我想针对几个复杂的正则表达式检查每一行。像这样的东西

if re.match(regex1, line): do stuff
elif re.match(regex2, line): do other stuff
elif re.match(regex3, line): do still more stuff
...

当然，为了做这些事情，我需要匹配对象。我只能想到三种可能性，每种可能性都有所不足。

if re.match(regex1, line): 
    m = re.match(regex1, line)
    do stuff
elif re.match(regex2, line):
    m = re.match(regex2, line)
    do other stuff
...

需要进行两次复杂的匹配（这些是长文件和长正则表达式：/）

m = re.match(regex1, line)
if m: do stuff
else:
    m = re.match(regex2, line)
    if m: do other stuff
    else:
       ...

随着我进一步缩进而变得非常糟糕。

while True:
    m = re.match(regex1, line)
    if m:
        do stuff
        break
    m = re.match(regex2, line)
    if m:
        do other stuff
        break
    ...

看起来很奇怪。

这样做的正确方法是什么？

Answer 1

您可以为每个正则表达式所需的操作定义一个函数，并执行类似

的操作

def dostuff():
    stuff

def dootherstuff():
    otherstuff

def doevenmorestuff():
    evenmorestuff

actions = ((regex1, dostuff), (regex2, dootherstuff), (regex3, doevenmorestuff))

for regex, action in actions:
    m = re.match(regex, line)
    if m: 
        action()
        break

Answer 2

for patt in (regex1, regex2, regex3):
    match = patt.match(line)
    if match:
        if patt == regex1:
            # some handling
        elif patt == regex2:
            # more
        elif patt == regex3:
            # more
        break

我喜欢Tim的答案，因为它将每个正则表达式匹配代码分开来保持简单。对于我的回答，我不会为每场比赛添加一行或两行代码，如果您需要更多，请调用单独的方法。

Answer 3

首先，你真的需要使用正则表达式进行匹配吗？在我使用regexps的地方，比如perl，我经常在python中使用字符串函数（find，startswith等）。

如果你真的需要使用regexp，你可以创建一个简单的搜索功能来执行搜索，如果返回匹配，则设置一个商店对象以保持匹配，然后返回True。

如，

def search(pattern, s, store):
    match = re.search(pattern, s)
    store.match = match
    return match is not None

class MatchStore(object):
    pass   # irrelevant, any object with a 'match' attr would do

where = MatchStore()
if search(pattern1, s, where):
    pattern1 matched, matchobj in where.match
elif search(pattern2, s, where):
    pattern2 matched, matchobj in where.match
...

Answer 4

在这种特殊情况下，似乎没有方便的方法在python中执行此操作。如果python接受语法：

if (m = re.match(pattern,string)):
    text = m.group(1)

然后一切都会好的，但显然你不能这样做

Answer 5

你的最后一个建议是在函数中包含更多的Pythonic：

def parse_line():
    m = re.match(regex1, line)
    if m:
        do stuff
        return
    m = re.match(regex2, line)
    if m:
        do other stuff
        return
    ...

也就是说，使用带有一些运算符重载类的简单容器类，你可以更接近你想要的东西：

class ValueCache():
    """A simple container with a returning assignment operator."""
    def __init__(self, value=None):
        self.value = value
    def __repr__(self):
        return "ValueCache({})".format(self.value)
    def set(self, value):
        self.value = value
        return value
    def __call__(self):
        return self.value
    def __lshift__(self, value):
        return self.set(value)
    def __rrshift__(self, value):
        return self.set(value)

match = ValueCache()
if (match << re.match(regex1, line)):
    do stuff with match()
elif (match << re.match(regex2, line)):
    do other stuff with match()

Answer 6

我会将你的正则表达式分解成更小的组件，并在稍后用较长的匹配搜索简单的第一个。

类似的东西：

if re.match(simplepart,line):
      if re.match(complexregex, line):
          do stuff
elif re.match(othersimple, line):
      if re.match(complexother, line):
          do other stuff

Answer 7

为什么不使用字典/开关语句？

def action1(stuff):
    do the stuff 1
def action2(stuff):
    do the stuff 2

regex_action_dict = {regex1 : action1, regex2 : action2}
for regex, action in regex_action_dict.iteritems():
    match_object = re.match(regex, line):
    if match_object:
        action(match_object, line)

Answer 8

FWIW，我已经强调过同样的事情，我通常会选择第二种形式（嵌套else s）或一些变体。如果你想要优化可读性，我认为你一般都找不到更好的东西（很多这些答案看起来比我的候选人的可读性差得多）。

有时，如果您使用的是外循环或短函数，则可以使用第3种形式的变体（带有break语句的变体），其中continue或{{1} }，这是可读的，但我绝对不会创建一个return块，只是为了避免其他候选人的“丑陋”。

Answer 9

我的解决方案有一个例子;只执行了一次re.search()：

text = '''\
koala + image @ wolf - snow
Good evening, ladies and gentlemen
An uninteresting line
There were 152 ravens on a branch
sea mountain sun ocean ice hot desert river'''

import re
regx3 = re.compile('hot[ \t]+([^ ]+)')
regx2 = re.compile('(\d+|ev.+?ng)')
regx1 = re.compile('([%~#`\@+=\d]+)')
regx  = re.compile('|'.join((regx3.pattern,regx2.pattern,regx1.pattern)))

def one_func(line):
    print 'I am one_func on : '+line

def other_func(line):
    print 'I am other_func on : '+line

def another_func(line):
    print 'I am another_func on : '+line

tupl_funcs = (one_func, other_func, another_func) 


for line in text.splitlines():
    print line
    m = regx.search(line)
    if m:
        print 'm.groups() : ',m.groups()
        group_number = (i for i,m in enumerate(m.groups()) if m).next()
        print "group_number : ",group_number
        tupl_funcs[group_number](line)
    else:
        print 'No match'
        print 'No treatment'
    print

结果

koala + image @ wolf - snow
m.groups() :  (None, None, '+')
group_number :  2
I am another_func on : koala + image @ wolf - snow

Good evening, ladies and gentlemen
m.groups() :  (None, 'evening', None)
group_number :  1
I am other_func on : Good evening, ladies and gentlemen

An uninteresting line
No match
No treatment

There were 152 ravens on a branch
m.groups() :  (None, '152', None)
group_number :  1
I am other_func on : There were 152 ravens on a branch

sea mountain sun ocean ice hot desert river
m.groups() :  ('desert', None, None)
group_number :  0
I am one_func on : sea mountain sun ocean ice hot desert river

Answer 10

使用匹配作为状态创建一个类。在条件之前实例化它，它应该存储你匹配的字符串。

Answer 11

您可以定义一个接受正则表达式的本地函数，根据您的输入对其进行测试，然后将结果存储到闭包范围的变量中：

match = None

def matches(pattern):
    nonlocal match, line
    match = re.match(pattern, line)
    return match

if matches(regex1):
    # do stuff with `match`

elif matches(regex2):
    # do other stuff with `match`

我不确定这种方法是Pythonic的，但这是我发现的在if-elif-else链中进行正则表达式匹配并保留匹配对象的最干净的方法。

请注意，这种方法仅在Python 3.0+中有效，因为它需要PEP 3104 nonlocal语句。在早期的Python版本there's no clean way for a function to assign to a variable in a non-global parent scope中。

还值得注意的是，如果您有一个足够大的文件，担心为每行运行两次正则表达式，则还应该使用re.compile对其进行预编译，并将生成的正则表达式对象传递给您检查函数而不是原始字符串。

Answer 12

您可以定义一个将匹配对象包装起来的类，该类带有用于执行匹配的调用方法：

class ReMatcher(object):
    match = None

    def __call__(self, pattern, string):
        self.match = re.match(pattern, string)
        return self.match

    def __getattr__(self, name):
        return getattr(self.match, name)

然后在您的条件下调用它，并将其用作结果块中的匹配对象：

match = ReMatcher()

if match(regex1, line):
    print(match.group(1))

elif match(regex2, line):
    print(match.group(1))

这几乎可以在所有Python版本中使用，并且在新样式类之前的版本中会稍作调整。与我的其他答案一样，如果您担心正则表达式的性能，则应使用re.compile。

条件语句中的Python正则表达式匹配

12 个答案: