我有以下文字:
Test 123:
This is a blue car
Test:
This car is not blue
This car is yellow
Hello:
This is not a test
我想整理一个正则表达式,找到所有以Test
或Hello
开头且以冒号开头的项目,以及可选的树数字编号,然后返回所有内容,直到符合相同描述的下一行。因此,对于上面的文本,findall正则表达式将返回一个数组:
[("Test", "123", "\nThis is a blue car\n"),
("Test", "", "\nThis car is not blue\n\nThis car is yellow\n"),
("Hello", "", "\nThis is not a test")]
到目前为止,我得到了这个:
r = re.findall(r'^(Test|Hello) *([^:]*):$', test, re.MULTILINE)
它根据描述匹配每一行,但我不确定如何捕获内容,直到下一行以冒号结束。有什么想法吗?
答案 0 :(得分:5)
您可以使用以下使用DOTALL修饰符的正则表达式
(?:^|\n)(Test|Hello) *([^:]*):\n(.*?)(?=\n(?:Test|Hello)|$)
>>> import re
>>> s = """Test 123:
...
... This is a blue car
...
... Test:
...
... This car is not blue
...
... This car is yellow
...
... Hello:
...
... This is not a test"""
>>> re.findall(r'(?s)(?:^|\n)(Test|Hello) *([^:]*):\n(.*?)(?=\n(?:Test|Hello)|$)', s)
[('Test', '123', '\nThis is a blue car\n'), ('Test', '', '\nThis car is not blue\n\nThis car is yellow\n'), ('Hello', '', '\nThis is not a test')]
答案 1 :(得分:0)
import re
p = re.compile(ur'(Test|Hello)\s*([^:]*):\n(\n.*?)(?=Test[^:]*:|Hello[^:]*:|$)', re.DOTALL | re.IGNORECASE)
test_str = u"Test 123:\n\nThis is a blue car\n\nTest:\n\nThis car is not blue\n\nThis car is yellow\n\nHello:\n\nThis is not a test"
re.findall(p, test_str)
你可以尝试一下。参见演示。