拆分字符串而不删除python中的分隔符

时间:2016-07-12 11:31:07

标签: python split

我需要在不删除Python中的分隔符的情况下拆分字符串。

例如:

content = 'This 1 string is very big 2 i need to split it 3 into paragraph wise. 4 But this string 5 not a formated string.'
content = content.split('\s\d\s') 

在此之后我会这样:

This\n
string is very big\n
i need to split it\n
into paragraph wise.\n
But this string\n
not a formated string.

但我想这样:

This\n
1 string is very big\n
2 i need to split it\n
3 into paragraph wise.\n
4 But this string\n
5 not a formated string

5 个答案:

答案 0 :(得分:2)

使用python提供的regex模块。 通过re.sub,您可以找到一个正则表达式组,并将其替换为您想要的字符串。 \g<0>用于使用匹配的组(在本例中为数字)。

示例:

import re

content = 'This 1 string is very big 2 i need to split it 3 into paragraph wise. 4 But this string 5 not a formated string.'
result = re.sub(r'\s\d\s',r'\n\g<0>',content)

结果将是:

'This\n 1 string is very big\n 2 i need to split it\n 3 into paragraph wise.\n 4 But this string\n 5 not a formated string.'

Here是有关re.sub

的更深入细节

答案 1 :(得分:1)

您可以使用re.split进行前瞻性预测:

import re
re.split('\s(?=\d\s)',content)

导致:

['This', '1 string is very big', '2 i need to split it', '3 into paragraph wise.', '4 But this string', '5 not a formated string.']

这会分隔空格 - 但只有那些紧跟一个数字然后是另一个空格的空格。

答案 2 :(得分:0)

为什么不直接存储输出,迭代它,然后将分隔符放回到你想要的位置?如果分隔符每次都需要更改,您可以使用循环的索引来迭代以确定它们/需要它们。

您可能会发现this帖子很有用。

答案 3 :(得分:0)

你可以试试这个

import re
content = 'This 1 string is very big 2 i need to split it 3 into paragraph wise. 4 But this string 5 not a formated string.'
[ i.group(0).strip() for i in re.finditer('\S\d?[^\d]+', content)]

这个字符串在到达数字时停止匹配字符串,但允许在开头的数字。

以下是输出:

['This', '1 string is very big', '2 i need to split it', '3 into paragraph wise.', '4 But this string', '5 not a formated string.']

答案 4 :(得分:0)

如果只是新行的问题,那么使用字符串方法splitlines()和keepends = True:

>>> "This\nis\na\ntest".splitlines(True)
["This\n", "is\n", "a\n", "test"]

否则你可以:

def split (s, d="\n"):
    d = str(d)
    if d=="": raise ValueError, "empty separator"
    f = s.find(d)
    if f==-1: return [s]
    l = []
    li = 0 # Last index
    add = len(d)
    while f!=-1:
        l.append(s[li:f+add])
        li = f+add
        f = s.find(d, li)
    e = s[li:]
    if e: l.append(e)
    return l