Python:删除所有以大写字母开头但标点符号后没有出现的单词

时间:2019-02-16 13:30:42

标签: regex python-3.x

我想使用正则表达式删除以大写字母开头并满足以下两个条件的所有单词:

1)后面仅跟小写字母或“'s”(所有)或标点符号(。,?!)。

2)它们不跟在“。”,“!”之后和“?”

我尝试了

import re

myString='The name of her company is Water Company WC 123 WaTerCompany! She was going to meet Daniel. Why? Because Daniel is her boy friend. Patricia? The daughter of Susana! Look, Daniel\'s car is white'
regex='([A-Z][a-z\']*)(\s[A-Z][a-z\']*)*'
txt = re.sub(regex, " ", myString)        

我得到了

name of her company is    123    !   was going to meet  .  ?   is her boy friend.  ?   daughter of  !  ,   car is white

我想要

name of her company is  WC 123 WaTerCompany! She was going to meet . Why? Because is her boy friend. Patricia? The daughter of ! Look, car is white

1 个答案:

答案 0 :(得分:2)

要删除整个单词,您想使用\b边界锚,以免与部分单词匹配。要删除标点符号之前的单词,可以使用提供的后面的否定式,即在标点符号和第一个字母之间始终存在固定的空格。

我将假定标点符号和下一个字母之间始终存在一个空格。您始终可以先将一个空格替换为多个空格,然后再对输入进行标准化。

这使得正则表达式可以删除以下单词:

\b(?<![!?.]\s)[A-Z][a-z]*(?:'s)?\b

和演示:

>>> import re
>>> myString='The name of her company is Water Company WC 123 WaTerCompany! She was going to meet Daniel. Why? Because Daniel is her boy friend. Patricia? The daughter of Susana! Look, Daniel\'s car is white'
>>> regex = r'\b(?<![!?.]\s)[A-Z][a-z]*(?:'s)?\b'
>>> re.sub(regex, " ", myString)
'  name of her company is     WC 123 WaTerCompany! She was going to meet  . Why? Because   is her boy friend. Patricia? The daughter of  ! Look,   car is white'

或在regex101在线尝试模式。