我想使用正则表达式删除以大写字母开头并满足以下两个条件的所有单词:
1)后面仅跟小写字母或“'s”(所有)或标点符号(。,?!)。
2)它们不跟在“。”,“!”之后和“?”
我尝试了
import re
myString='The name of her company is Water Company WC 123 WaTerCompany! She was going to meet Daniel. Why? Because Daniel is her boy friend. Patricia? The daughter of Susana! Look, Daniel\'s car is white'
regex='([A-Z][a-z\']*)(\s[A-Z][a-z\']*)*'
txt = re.sub(regex, " ", myString)
我得到了
name of her company is 123 ! was going to meet . ? is her boy friend. ? daughter of ! , car is white
我想要
name of her company is WC 123 WaTerCompany! She was going to meet . Why? Because is her boy friend. Patricia? The daughter of ! Look, car is white
答案 0 :(得分:2)
要删除整个单词,您想使用\b
边界锚,以免与部分单词匹配。要删除标点符号之前的单词,可以使用提供的后面的否定式,即在标点符号和第一个字母之间始终存在固定的空格。
我将假定标点符号和下一个字母之间始终存在一个空格。您始终可以先将一个空格替换为多个空格,然后再对输入进行标准化。
这使得正则表达式可以删除以下单词:
\b(?<![!?.]\s)[A-Z][a-z]*(?:'s)?\b
和演示:
>>> import re
>>> myString='The name of her company is Water Company WC 123 WaTerCompany! She was going to meet Daniel. Why? Because Daniel is her boy friend. Patricia? The daughter of Susana! Look, Daniel\'s car is white'
>>> regex = r'\b(?<![!?.]\s)[A-Z][a-z]*(?:'s)?\b'
>>> re.sub(regex, " ", myString)
' name of her company is WC 123 WaTerCompany! She was going to meet . Why? Because is her boy friend. Patricia? The daughter of ! Look, car is white'
或在regex101在线尝试模式。