Question

我很难在python中找到文件开头和结尾的正则表达式。我该如何做到这一点？

Answer 1

将整个文件读入字符串，然后\ A仅匹配字符串的开头，\ Z仅匹配字符串的结尾。使用re.MULTILINE，'^'匹配字符串的开头和刚好在换行符之后，'$'匹配字符串的结尾和就在换行符之前。请参阅re syntax的Python文档。

import re

data = '''sentence one.
sentence two.
a bad sentence
sentence three.
sentence four.'''

# find lines ending in a period
print re.findall(r'^.*\.$',data,re.MULTILINE)
# match if the first line ends in a period
print re.findall(r'\A^.*\.$',data,re.MULTILINE)
# match if the last line ends in a period.
print re.findall(r'^.*\.$\Z',data,re.MULTILINE)

输出：

['sentence one.', 'sentence two.', 'sentence three.', 'sentence four.']
['sentence one.']
['sentence four.']

Answer 2

也许你应该更清楚地提出你的问题，就像你想要做的那样。也就是说，您可以将文件粘贴到一个完整的字符串中，并使用re。匹配您的模式。

import re
data=open("file").read()
pat=re.compile("^.*pattern.*$",re.M|re.DOTALL)
print pat.findall(data)

有更好的方法可以做你想做的事，无论是什么，没有重新。

Answer 3

正则表达式$ 不您的朋友;见this SO answer

使用正则表达式匹配python中文件的开始和结束

3 个答案: