正则表达式匹配某些字符但不包含开头的句点

时间:2014-01-29 16:43:14

标签: python regex regex-lookarounds

我有一个有一些空格的字符串。我想用句号替换它们,但不是用句号结束​​的句号。

例如。

text = "This is the oldest European-settled town in the continental " \
   "U.S.\r\nExplore the town at your leisure\r\nUpgrade to add a " \
   "scenic cruise aboard \r\n"

我正在尝试使用正则表达式将其更改为以下内容。

text = "This is the oldest European-settled town in the continental " \
   "U.S. Explore the town at your leisure. Upgrade to add" \
   " a scenic cruise aboard."

我现在拥有的是:

new_text = re.sub("(( )?(\\n|\\r\\n)+)", ". ", text).strip()

但是,它没有照顾句子以句号结束。我应该在这里使用一些外观以及如何使用?

提前致谢!!

3 个答案:

答案 0 :(得分:2)

您可以添加“。”在正则表达式中:(( )?\.?(\\n|\\r\\n)+)。如果有“。”它也将被替换为“。”

答案 1 :(得分:1)

好吧,我不确定你的意思是\r\n是不是字面意思,所以......

文字:

>>> import re
>>> text = r"This is the oldest European-settled town in the continental U.S.\r\nExplore the town at your leisure\r\nUpgrade to add a scenic cruise aboard \r\n"
>>> result = re.sub(r'[ .]*(?:(?:\\r)?\\n)+', '. ', text).strip()
>>> print(result)
This is the oldest European-settled town in the continental U.S. Explore the town at your leisure. Upgrade to add a scenic cruise aboard.

ideone demo

不是文字:

>>> import re
>>> text = "This is the oldest European-settled town in the continental U.S.\r\nExplore the town at your leisure\r\nUpgrade to add a scenic cruise aboard \r\n"
>>> result = re.sub(r'[ .]*(?:\r?\n)+', '. ', text).strip()
>>> print(result)
This is the oldest European-settled town in the continental U.S. Explore the town at your leisure. Upgrade to add a scenic cruise aboard.

ideone demo

我删除了一些不必要的组,并将其他一些组转换为非捕获组。

我还将(\\n|\\r\\n)+)转换为性能稍高的(?:(?:\\r)?\\n)+)

答案 2 :(得分:0)

如果您只想摆脱新线路,请使用此

text = "This is the oldest European-settled town in the continental U.S.\r\nExplore the town at your leisure\r\nUpgrade to add a scenic cruise aboard \r\n"
text = text.replace('\r\n','')