我在文件中有文字。
INCLUDE '.\..\..\
FE_10-28\
ASSY.bdf'
INCLUDE '.\..\..\FE_10-28\standalone\COORD.bdf'
$ INCLUDE '.\..\..\FE_10-28\standalone\bracket.bdf'
$ INCLUDE '.\..\..\
$ FE_10-28\standalone\
$ ITFC.bdf'
我想要一个表达式来捕获字符串(应该跳过以$开头的行):
['.\..\..\FE_10-28\ASSY.bdf', '.\..\..\FE_10-28\standalone\COORD.bdf']
我设法过滤单行字符串:
with open(bdf_name,'r') as f:
file_buff = f.readlines()
text = ''.join(file_buff)
regex_incl = re.compile("[^$]\s+include\s+\'(.*)\'",re.IGNORECASE|re.MULTILINE)
print(regex_incl.findall(text))
但是,多线怎么样?
答案 0 :(得分:2)
首先,您需要标记re.DOTALL
,否则点.
与新行不匹配。并立即读取所有数据。
with open(bdf_name, 'r') as f:
data = r.read()
re.findall("^include\s+\'(.*?)\'", data,
flags=re.IGNORECASE|re.MULTILINE|re.DOTALL)
#['.\\..\\..\\\nFE_10-28\\\nASSY.bdf', '.\\..\\..\\FE_10-28\\standalone\\COORD.bdf']
如果您不想换行,请使用.replace("\n","")
将其删除。
答案 1 :(得分:2)
您可以使用此regex
:
>>> raw = '''
... INCLUDE '.\..\..\
FE_10-28\
ASSY.bdf'
INCLUDE '.\..\..\FE_10-28\standalone\COORD.bdf'
$ INCLUDE '.\..\..\FE_10-28\standalone\bracket.bdf'
$ INCLUDE '.\..\..\
$ FE_10-28\standalone\
$ ITFC.bdf'... ... ... ... ... ... ... ... ... ...
... '''
>>>
>>> re.findall(r"^INCLUDE\s+'(.+?)'\n", raw, re.M|re.DOTALL)
['.\\..\\..FE_10-28ASSY.bdf', '.\\..\\..\\FE_10-28\\standalone\\COORD.bdf']