我提取了一些文字并希望通过RegEx清理它。
我已经学习了基本的RegEx,但不知道如何构建这个:
str = '''
this is
a line that has been cut.
This is a line that should start on a new line
'''
应转换为:
str = '''
this is a line that has been cut.
This is a line that should start on a new line
'''
此r'\w\n\w'
似乎抓住了它,但不确定如何用空格替换新行而不触及结尾和单词的开头
答案 0 :(得分:3)
您可以将此lookbehind正则表达式用于re.sub
:
>>> str = '''
... this is
... a line that has been cut.
... This is a line that should start on a new line
... '''
>>> print re.sub(r'(?<!\.)\n', '', str)
this is a line that has been cut.
This is a line that should start on a new line
>>>
(?<!\.)\n
匹配所有不带点的换行符。
如果您不想根据点的存在进行匹配,请使用:
re.sub(r'(?<=\w\s)\n', '', str)