Question

我的正则表达式是：

TMP_REGEXP = r'_\(\s*(.*)\s*\)\s*$'
TMP_PATTERN = re.compile(TMP_REGEXP, re.MULTILINE)

文件input_data.txt：

print _(
    'Test #A'
    )              

print _(
    '''Test #B'''
    '''Test #C'''
)

我这样运行：

with codecs.open('input_data.txt', encoding='utf-8') as flp:
    content = flp.read()

extracted = re.findall(TMP_PATTERN, content)

我想要达到的目标是： - 接下来的所有字符＆＃39; _（＆＃39; - 如果有＆＃39;）结束阅读字符＆＃39;后跟零个或多个空格和行尾

有趣的是＆＃39;测试#A＆＃39;就像一个魅力bu＆＃39;测试＃B＆＃39;被跳过了。

Answer 1

这对我有用：

m = re.findall(r'(?s)_\((.*?)\)', content)

(?s)查找任何内容（包括换行符）。

_\(符合您想要的开始。

(.*?)寻找一些东西。

\)符合你的目的。

最后你可能想要$并做一些剥离。

>>> content = """
... print _(
...     'Test #A'
...     )              
... 
... print _(
...     '''Test #B'''
...     '''Test #C'''
... )
... """
>>> import re
>>> m = re.findall(r'(?s)_\((.*?)\)', content)
>>> for i, match in enumerate(m, 1):
...     print("Match {0}: {1}".format(i, match))
... 
Match 1: 
    'Test #A'

Match 2: 
    '''Test #B'''
    '''Test #C'''

>>>

Python regexp在多行

1 个答案: