使用re.sub()输出错误

时间:2014-09-25 13:38:38

标签: python regex

问题陈述:

  

如果句点后面紧跟一个字母,则在一段时间后插入一个额外的空格。

以下是代码:

string="This   is  very funny  and    cool.Indeed!"

re.sub("\.[a-zA-Z]", ". ", string)

并输出:

"This is very funny and cool. ndeed!"

它正在替换'.'之后的第一个字符。

对此有何解决方案?

2 个答案:

答案 0 :(得分:3)

您可以使用不占用匹配部分的positivie lookahead assertion

>>> re.sub(r"\.(?=[a-zA-Z])", ". ", string)
'This   is  very funny  and    cool. Indeed!'

使用capturing group and backreference替代方案:

>>> re.sub(r"\.([a-zA-Z])", r". \1", string)  # NOTE - r"raw string literal"
'This   is  very funny  and    cool. Indeed!'

仅供参考,您可以使用\S代替[a-zA-Z]来匹配非空格字符。

答案 1 :(得分:0)

您还可以在正则表达式中同时使用lookahead and lookbehind

>>> import re
>>> string="This   is  very funny  and    cool.Indeed!"
>>> re.sub(r'(?<=\.)(?=[A-Za-z])', r' ', string)
'This   is  very funny  and    cool. Indeed!'

OR

您可以使用\b

>>> re.sub(r'(?<=\.)\b(?=[A-Za-z])', r' ', string)
'This   is  very funny  and    cool. Indeed!'

<强>解释

  • (?<=\.)只需照看文字点。
  • (?=[A-Za-z])断言匹配的边界后面必须跟一个字母。
  • 如果是,则用空格替换边界。