如何在python的字符串中的每个字符后添加换行符,例如“。[xxx]”

时间:2019-07-05 06:32:41

标签: python regex

我有以下字符串:

It reported the proportion of the edits made from America was 51% for the Wikipedia, and 25% for the simple Wikipedia.[142] The Wikimedia Foundation hopes to increase the number in the Global South to 37% by 2015.[143]

我正在尝试将.[xxx]之类的所有字符替换为.[xxx] \n

x是数字

我正在从不同的茎溢出答案中寻求帮助;其中之一是:

Python insert a line break in a string after character "X"

Regex: match fullstop and one word in python

import re
str = "It reported the proportion of the edits made from America was 51% 
for the Wikipedia, and 25% for the simple Wikipedia.[142] The Wikimedia 
Foundation hopes to increase the number in the Global South to 37% by 
2015.[143] "
x = re.sub("\.\[[0-9]{2,5}\]\s", "\.\[[0-9]{2,5}\]\s\n",str)
print(x)

我期望以下输出:

It reported the proportion of the edits made from America was 51% for the Wikipedia, and 25% for the simple Wikipedia.[142]                          
The Wikimedia Foundation hopes to increase the number in the Global South to 37% by 2015.[143]”

但是我得到了:

It reported the proportion of the edits made from America was 51% for the Wikipedia, and 25% for the simple Wikipedia\\.\[[0-9]{2,5}\]\s   The Wikimedia Foundation hopes to increase the number in the Global South to 37% by 2015\\.\[[0-9]{2,5}\]\s

3 个答案:

答案 0 :(得分:1)

您可能想在re.sub中使用捕获组和反向引用。您也不需要转义替换字符串( regex101 ):

import re
s = '''It reported the proportion of the edits made from America was 51% for the Wikipedia, and 25% for the simple Wikipedia.[142] The Wikimedia Foundation hopes to increase the number in the Global South to 37% by 2015.[143] '''
x = re.sub(r'\.\[([0-9]{2,5})\]\s', r'.[\1] \n', s)
print(x)

打印:

It reported the proportion of the edits made from America was 51% for the Wikipedia, and 25% for the simple Wikipedia.[142] 
The Wikimedia Foundation hopes to increase the number in the Global South to 37% by 2015.[143] 

答案 1 :(得分:1)

您可以使用

(\.\[[^][]*\])\s*

并将其替换为\1\n,请参见a demo on regex101.com


这是

(
    \.\[   # ".[" literally
    [^][]* # neither "[" nor "]" 0+ times
    \]     # "]" literally
)\s*       # consume whitespaces, eventually

答案 2 :(得分:1)

使用findall()识别匹配模式的列表。然后,您可以将其替换为原始字符串+'\ n'