我想从一串文字中删除首字母缩略词的句号,但我也希望o留下常规句点(例如在句子的末尾)。
以下句子:
"The C.I.A. is a department in the U.S. Government."
应该成为
"The CIA is a department in the US Government."
使用Python有一种干净的方法吗?到目前为止,我有两个步骤:
words = "The C.I.A. is a department in the U.S. Government."
words = re.sub(r'([A-Z].[A-Z.]*)\.', r'\1', words)
print words
# The C.I.A is a department in the U.S Government.
words = re.sub(r'\.([A-Z])', r'\1', words)
print words
# The CIA is a department in the US Government.
答案 0 :(得分:12)
可能这个?
>>> re.sub(r'(?<!\w)([A-Z])\.', r'\1', s)
'The CIA is a department in the US Government.'
替换前面带有大写单个字母的单个点,前提是单个字母前面没有\w
字符集中的任何内容。后面的标准由否定后瞻断言 - (?<!\w)
强制执行。