Question

我有一个这样的字符串：

----------

FT Weekend

----------

Why do we run marathons?
Are marathons and cycling races about more than exercise? What does the 
literature of endurance tell us about our thirst for self-imposed hardship?

我想删除从----------到下一个----------的部分。

我一直在使用re.sub：

pattern =r"-+\n.+\n-+"
re.sub(pattern, '', thestring)

Answer 1

pattern =r"-+\n.+?\n-+"
re.sub(pattern, '', thestring,flags=re.DOTALL)

使用DOTALL标记。正则表达式的问题是默认情况下.与\n不匹配。所以你需要明确添加一个标记DOTALL来制作它匹配\n。

参见演示。

https://regex101.com/r/hR7tH4/24

或

pattern =r"-+\n[\s\S]+?\n-+"
re.sub(pattern, '', thestring)

如果您不想添加标记

Answer 2

您的正则表达式与预期的部分不匹配，因为.+没有捕获换行符。您可以使用re.DOTALL标记强制.匹配换行符或re.S。但不是这样您可以使用否定字符类：

>>> print re.sub(r"-+[^-]+-+", '', s)
''

Why do we run marathons?
Are marathons and cycling races about more than exercise? What does the 
literature of endurance tell us about our thirst for self-imposed hardship? 
>>>

或者更准确地说：

>>> print re.sub(r"-+[^-]+-+[^\w]+", '', s)
'Why do we run marathons?
Are marathons and cycling races about more than exercise? What does the 
literature of endurance tell us about our thirst for self-imposed hardship? 
>>>

Answer 3

你的正则表达式（-+\n.+\n-+）的问题是.匹配任何字符但是换行符，并且它太贪心（.+），并且可以跨越多个-------实体。

您可以使用以下正则表达式：

pattern = r"(?s)-+\n.+?\n-+"

(?s)单行选项可使.与任何字符匹配，包括换行符。 .+?模式将匹配1个或多个字符，但尽可能少匹配下一个----。

请参阅IDEONE demo

为了进行更深刻的清理，我建议：

pattern = r"(?s)\s*-+\n.+?\n-+\s*"

请参阅another demo

如何编写正则表达式以在python中使用re.split

3 个答案: