Python正则表达式匹配多行文本

时间:2018-03-27 06:38:24

标签: python regex

我在文件中有文字。

INCLUDE '.\..\..\
FE_10-28\
ASSY.bdf'

INCLUDE '.\..\..\FE_10-28\standalone\COORD.bdf'

$ INCLUDE '.\..\..\FE_10-28\standalone\bracket.bdf'

$ INCLUDE '.\..\..\
$ FE_10-28\standalone\
$ ITFC.bdf'

我想要一个表达式来捕获字符串(应该跳过以$开头的行):

['.\..\..\FE_10-28\ASSY.bdf', '.\..\..\FE_10-28\standalone\COORD.bdf']

我设法过滤单行字符串:

    with open(bdf_name,'r') as f:
        file_buff = f.readlines()

    text = ''.join(file_buff)
    regex_incl = re.compile("[^$]\s+include\s+\'(.*)\'",re.IGNORECASE|re.MULTILINE)
    print(regex_incl.findall(text))

但是,多线怎么样?

2 个答案:

答案 0 :(得分:2)

首先,您需要标记re.DOTALL,否则点.与新行不匹配。并立即读取所有数据。

with open(bdf_name, 'r') as f:
    data = r.read()

re.findall("^include\s+\'(.*?)\'", data, 
           flags=re.IGNORECASE|re.MULTILINE|re.DOTALL)
#['.\\..\\..\\\nFE_10-28\\\nASSY.bdf', '.\\..\\..\\FE_10-28\\standalone\\COORD.bdf']

如果您不想换行,请使用.replace("\n","")将其删除。

答案 1 :(得分:2)

您可以使用此regex

>>> raw = '''
... INCLUDE '.\..\..\
FE_10-28\
ASSY.bdf'

INCLUDE '.\..\..\FE_10-28\standalone\COORD.bdf'

$ INCLUDE '.\..\..\FE_10-28\standalone\bracket.bdf'

$ INCLUDE '.\..\..\
$ FE_10-28\standalone\
$ ITFC.bdf'... ... ... ... ... ... ... ... ... ...
... '''
>>>
>>> re.findall(r"^INCLUDE\s+'(.+?)'\n", raw, re.M|re.DOTALL)
['.\\..\\..FE_10-28ASSY.bdf', '.\\..\\..\\FE_10-28\\standalone\\COORD.bdf']