正则表达式删除首字母缩略词的句号?

时间:2016-10-22 20:52:29

标签: python regex

我想从一串文字中删除首字母缩略词的句号,但我也希望o留下常规句点(例如在句子的末尾)。

以下句子:

"The C.I.A. is a department in the U.S. Government."

应该成为

"The CIA is a department in the US Government."

使用Python有一种干净的方法吗?到目前为止,我有两个步骤:

words = "The C.I.A. is a department in the U.S. Government."
words = re.sub(r'([A-Z].[A-Z.]*)\.', r'\1', words)
print words
# The C.I.A is a department in the U.S Government.    
words = re.sub(r'\.([A-Z])', r'\1', words)
print words
# The CIA is a department in the US Government.

1 个答案:

答案 0 :(得分:12)

可能这个?

>>> re.sub(r'(?<!\w)([A-Z])\.', r'\1', s)
'The CIA is a department in the US Government.'

替换前面带有大写单个字母的单个点,前提是单个字母前面没有\w字符集中的任何内容。后面的标准由否定后瞻断言 - (?<!\w)强制执行。