Question

如果有一行只包含＆＃34; ----＆＃34;我想分割一段文本。我使用re.split(..)方法，但它的行为不符合预期。我错过了什么？

import re

s = """width:5
----
This is a test sentence to test the width thing"""

print re.split('^----$', s)

这只是打印

['width:5\n----\nThis is a test scentence to test the width thing']

Answer 1

您错过了MULTILINE flag：

print re.split(r'^----$', s, flags=re.MULTILINE)

没有它^和$应用于整个s字符串，而不是字符串中的每一行：

re.MULTILINE

指定时，模式字符＆＃39; ^＆＃39;比赛开始时   字符串和每行的开头（紧随其后）   每个换行符）;和模式字符＆＃39; $＆＃39;比赛结束时   字符串和每行的末尾（紧接在每行之前）   换行）。

演示：

>>> import re
>>> 
>>> s = """width:5
... ----
... This is a test sentence to test the width thing"""
>>> 
>>> print re.split(r'^----$', s, flags=re.MULTILINE)
['width:5\n', '\nThis is a test sentence to test the width thing']

Answer 2

此外，您不能使用^和$，因为您使用^和$指定正则表达式引擎从字符串的第一个到结尾匹配，并使用{{ 3}}保持\n：

>>> print re.split('(?<=\n)----(?=\n)', s)
['width:5\n', '\nThis is a test sentence to test the width thing']

Answer 3

另一种不使用正则表达式进行拆分的方法。

s.split("\n----\n")

Answer 4

更少的代码使其完美如预期：

在：

re.split('[\n-]+', s, re.MULTILINE)

OUT：

['width:5', 'This is a test sentence to test the width thing']

Answer 5

你有没有尝试过：

result = re.split("^----$", subject_text, 0, re.MULTILINE)

通过文本中的特定行在Python中拆分字符串

5 个答案: