Python正则表达式拆分,但保留某些字符进行拆分

时间:2018-12-18 13:28:37

标签: regex python-3.x split

我有以下文字

text = "Perennials. Stolons slender. Perianth bristles 6 or 7, ca. 2 × as long as nutlet"

我想使用定义为“。\ s [A-Z]”的分隔符拆分段落。但是,我仍然希望保留原始句子中的[A-Z],以便输出如下:

['Perennials',
 'Stolons slender',
 'Perianth bristles 6 or 7, ca. 2 × as long as nutlet']

到目前为止,我所做的是:

re.split(r'\.\s[A-Z]', text)

但它删除了前几个字母:

['Perennials',
 'tolons slender',
 'erianth bristles 6 or 7, ca. 2 × as long as nutlet']

有人可以帮忙吗?谢谢〜

1 个答案:

答案 0 :(得分:2)

使用前瞻分割:

result = re.split(r'\.\s(?=[A-Z])', text)
print(result)

['Perennials', 'Stolons slender', 'Perianth bristles 6 or 7, ca. 2 × as long as nutlet']

前瞻(?=[A-Z])会断言但不占用点和空格后面的是大写字母。