Question

我有以下输入：

input = "I love programming with Python-3.3! Do you? It's great... I give it a 10/10. It's free-to-use, no $$$ involved!"

首先，每个句子都应该移到一个新行。然后，所有标点符号应与“/”，“'”，“ - ”，“+”和“$”之类的单词“EXCEPT”分开。

所以输出应该是：

"I love programming with Python-3 . 3 ! 
Do you ?  
It's great . . . 
I give it a 10/10 . 
It's free-to-use , no $$$ involved !"

我使用了以下代码：

>>> import re
>>> re.sub(r"([\w/'+$\s-]+|[^\w/'+$\s-]+)\s*", r"\1 ", input)
"I love programming with Python-3 . 3 ! Do you ? It's great ... I give it a 10/10 . It's free-    to-use , no $$$ involved ! "

但问题在于它没有将句子分成新的行。在标点符号和字符之间创建空格之前，如何使用正则表达式来执行此操作？

Answer 1

像

这样的东西

>>> import re
>>> from string import punctuation
>>> print re.sub(r'(?<=['+punctuation+'])\s+(?=[A-Z])', '\n', input)
I love programming with Python-3.3!
Do you?
It's great...
I give it a 10/10.
It's free-to-use, no $$$ involved!

Answer 2

([!?.])(?=\s*[A-Z])\s*

您可以使用此正则表达式在正则表达式之前创建句子。请参阅demo {替换\1\n。

https://regex101.com/r/sH8aR8/5

x="I love programming with Python-3.3! Do you? It's great... I give it a 10/10. It's free-to-use, no $$$ involved!"
print re.sub(r"([!?.])(?=\s*[A-Z])",r"\1\n",x)

编辑：

(?<![A-Z][a-z])([!?.])(?=\s*[A-Z])\s*

试试这个。查看不同数据集的演示。

https://regex101.com/r/sH8aR8/9

Python：如何使用正则表达式将句子拆分为新行，然后使用空格分隔标点符号？

2 个答案: