Question

我需要在不删除Python中的分隔符的情况下拆分字符串。

例如：

content = 'This 1 string is very big 2 i need to split it 3 into paragraph wise. 4 But this string 5 not a formated string.'
content = content.split('\s\d\s')

在此之后我会这样：

This\n
string is very big\n
i need to split it\n
into paragraph wise.\n
But this string\n
not a formated string.

但我想这样：

This\n
1 string is very big\n
2 i need to split it\n
3 into paragraph wise.\n
4 But this string\n
5 not a formated string

Answer 1

使用python提供的regex模块。通过re.sub，您可以找到一个正则表达式组，并将其替换为您想要的字符串。 \g<0>用于使用匹配的组（在本例中为数字）。

示例：

import re

content = 'This 1 string is very big 2 i need to split it 3 into paragraph wise. 4 But this string 5 not a formated string.'
result = re.sub(r'\s\d\s',r'\n\g<0>',content)

结果将是：

'This\n 1 string is very big\n 2 i need to split it\n 3 into paragraph wise.\n 4 But this string\n 5 not a formated string.'

Here是有关re.sub

的更深入细节

Answer 2

您可以使用re.split进行前瞻性预测：

import re
re.split('\s(?=\d\s)',content)

导致：

['This', '1 string is very big', '2 i need to split it', '3 into paragraph wise.', '4 But this string', '5 not a formated string.']

这会分隔空格 - 但只有那些紧跟一个数字然后是另一个空格的空格。

Answer 3

为什么不直接存储输出，迭代它，然后将分隔符放回到你想要的位置？如果分隔符每次都需要更改，您可以使用循环的索引来迭代以确定它们/需要它们。

您可能会发现this帖子很有用。

Answer 4

你可以试试这个

import re
content = 'This 1 string is very big 2 i need to split it 3 into paragraph wise. 4 But this string 5 not a formated string.'
[ i.group(0).strip() for i in re.finditer('\S\d?[^\d]+', content)]

这个字符串在到达数字时停止匹配字符串，但允许在开头的数字。

以下是输出：

['This', '1 string is very big', '2 i need to split it', '3 into paragraph wise.', '4 But this string', '5 not a formated string.']

Answer 5

如果只是新行的问题，那么使用字符串方法splitlines（）和keepends = True：

>>> "This\nis\na\ntest".splitlines(True)
["This\n", "is\n", "a\n", "test"]

否则你可以：

def split (s, d="\n"):
    d = str(d)
    if d=="": raise ValueError, "empty separator"
    f = s.find(d)
    if f==-1: return [s]
    l = []
    li = 0 # Last index
    add = len(d)
    while f!=-1:
        l.append(s[li:f+add])
        li = f+add
        f = s.find(d, li)
    e = s[li:]
    if e: l.append(e)
    return l

拆分字符串而不删除python中的分隔符

5 个答案: