我有以下文字
text = "Perennials. Stolons slender. Perianth bristles 6 or 7, ca. 2 × as long as nutlet"
我想使用定义为“。\ s [A-Z]”的分隔符拆分段落。但是,我仍然希望保留原始句子中的[A-Z],以便输出如下:
['Perennials',
'Stolons slender',
'Perianth bristles 6 or 7, ca. 2 × as long as nutlet']
到目前为止,我所做的是:
re.split(r'\.\s[A-Z]', text)
但它删除了前几个字母:
['Perennials',
'tolons slender',
'erianth bristles 6 or 7, ca. 2 × as long as nutlet']
有人可以帮忙吗?谢谢〜
答案 0 :(得分:2)
使用前瞻分割:
result = re.split(r'\.\s(?=[A-Z])', text)
print(result)
['Perennials', 'Stolons slender', 'Perianth bristles 6 or 7, ca. 2 × as long as nutlet']
前瞻(?=[A-Z])
会断言但不占用点和空格后面的是大写字母。