使用re
库这应该是一项非常简单的任务。但是,我似乎无法在分隔符]
和[
上分割我的字符串。
我已阅读Splitting a string with multiple delimiters in Python,Python: Split string with multiple delimiters和Python: How to get multiple elements inside square brackets。
我的字符串:
data = "This is a string spanning over multiple lines.
At somepoint there will be square brackets.
[like this]
And then maybe some more text.
[And another text in square brackets]"
它应该返回:
['This is a string spanning over multiple lines.\nAt somepoint there will be square brackets.','like this', 'And then maybe some more text.', 'And another text in square brackets']
尝试的简短示例:
data2 = 'A new string. [with brackets] another line [and a bracket]'
我试过了:
re.split(r'(\[|\])', data2)
re.split(r'([|])', data2)
但是这些会导致我的结果列表中的分隔符或完全错误的列表:
['A new string. ', '[', 'with brackets', ']', ' another line ', '[', 'and a bracket', ']', '']
结果应该是:
['A new string.', 'with brackets', 'another line', 'and a bracket']
作为一项特殊要求,应删除分隔符前后的所有换行符和空格,并且不应包含在列表中。
答案 0 :(得分:7)
>>> re.split(r'\[|\]', data2)
['A new string. ', 'with brackets', ' another line ', 'and a bracket', '']
答案 1 :(得分:4)
正如arshajii指出的那样,对于这个特殊的正则表达式,你根本不需要任何组。
如果确实需要组来表示更复杂的正则表达式,则可以使用非捕获组进行拆分而不捕获分隔符。它可能对其他情况很有用,但在语法上凌乱有点过分。
(?:...)
A non-capturing version of regular parentheses. Matches whatever regular expression is inside the parentheses, but the substring matched by the group cannot be retrieved after performing a match or referenced later in the pattern.
http://docs.python.org/2/library/re.html
因此,这里不必要的复杂但具有示范性的例子将是:
re.split(r'(?:\[|\])', data2)
答案 2 :(得分:2)
改为使用它(没有捕获组):
re.split(r'\s*\[|]\s*', data)
或更短:
re.split(r'\s*[][]\s*', data)
答案 3 :(得分:0)
Couuld要么拆分要么全部找到,例如:
data2 = 'A new string. [with brackets] another line [and a bracket]'
使用拆分和过滤掉前导/尾随空格:
import re
print filter(None, re.split(r'\s*[\[\]]\s*', data2))
# ['A new string.', 'with brackets', 'another line', 'and a bracket']
或者可能采用findall方法:
print re.findall(r'[^\b\[\]]+', data2)
# ['A new string. ', 'with brackets', ' another line ', 'and a bracket'] # needs a little work on leading/trailing stuff...