我需要在不删除Python中的分隔符的情况下拆分字符串。
例如:
content = 'This 1 string is very big 2 i need to split it 3 into paragraph wise. 4 But this string 5 not a formated string.'
content = content.split('\s\d\s')
在此之后我会这样:
This\n
string is very big\n
i need to split it\n
into paragraph wise.\n
But this string\n
not a formated string.
但我想这样:
This\n
1 string is very big\n
2 i need to split it\n
3 into paragraph wise.\n
4 But this string\n
5 not a formated string
答案 0 :(得分:2)
使用python提供的regex模块。
通过re.sub
,您可以找到一个正则表达式组,并将其替换为您想要的字符串。 \g<0>
用于使用匹配的组(在本例中为数字)。
示例:
import re
content = 'This 1 string is very big 2 i need to split it 3 into paragraph wise. 4 But this string 5 not a formated string.'
result = re.sub(r'\s\d\s',r'\n\g<0>',content)
结果将是:
'This\n 1 string is very big\n 2 i need to split it\n 3 into paragraph wise.\n 4 But this string\n 5 not a formated string.'
Here是有关re.sub
答案 1 :(得分:1)
您可以使用re.split
进行前瞻性预测:
import re
re.split('\s(?=\d\s)',content)
导致:
['This', '1 string is very big', '2 i need to split it', '3 into paragraph wise.', '4 But this string', '5 not a formated string.']
这会分隔空格 - 但只有那些紧跟一个数字然后是另一个空格的空格。
答案 2 :(得分:0)
为什么不直接存储输出,迭代它,然后将分隔符放回到你想要的位置?如果分隔符每次都需要更改,您可以使用循环的索引来迭代以确定它们/需要它们。
您可能会发现this帖子很有用。
答案 3 :(得分:0)
你可以试试这个
import re
content = 'This 1 string is very big 2 i need to split it 3 into paragraph wise. 4 But this string 5 not a formated string.'
[ i.group(0).strip() for i in re.finditer('\S\d?[^\d]+', content)]
这个字符串在到达数字时停止匹配字符串,但允许在开头的数字。
以下是输出:
['This', '1 string is very big', '2 i need to split it', '3 into paragraph wise.', '4 But this string', '5 not a formated string.']
答案 4 :(得分:0)
如果只是新行的问题,那么使用字符串方法splitlines()和keepends = True:
>>> "This\nis\na\ntest".splitlines(True)
["This\n", "is\n", "a\n", "test"]
否则你可以:
def split (s, d="\n"):
d = str(d)
if d=="": raise ValueError, "empty separator"
f = s.find(d)
if f==-1: return [s]
l = []
li = 0 # Last index
add = len(d)
while f!=-1:
l.append(s[li:f+add])
li = f+add
f = s.find(d, li)
e = s[li:]
if e: l.append(e)
return l