Question

我正在解析文本文件并希望删除所有的段落换行符，同时实际保留形成新段落的双行换行符。 e.g。

这是我的第一首诗\ n没有意义\ n它应该走多远\没有人能够知道。\ n \ n这是一个秒\ n不是那么长\ ngoodbye \ n \ n

打印出来时，应该如下所示：

This is my first poem
that does not make sense
how far should it go
nobody can know.

Here is a seconds
that is not as long
goodbye

应该成为

这是我的第一首没有意义的诗，没有人知道它应该走多远。\ n \ n这是一个不再那么长的再见\ n \ n

同样，在打印时，它应该看起来像：

This is my first poem that does not make sense how far should it go nobody can know.

Here is a seconds that is not as long goodbye

这里的技巧是删除＆＃39; \ n＆＃39;的单个出现，同时保持双线馈送＆＃39; \ n \ n＆＃39;，AND保留空白区域（即＆＃34） ;你好\ nworld＆＃34;成为＆＃34;你好世界＆＃34;而不是＆＃34; helloworld＆＃34;）。

我可以通过首先用虚拟字符串替换\ n \ n来完成此操作（例如＆＃34; $ $ $ ＆＃34;或同等的东西荒谬的），然后删除\ n然后重新转换＆＃34; $ $ $ ＆＃34;回到\ n \ n ......但这似乎过于迂回。我可以使用单个正则表达式调用进行此转换吗？

Answer 1

您可以用空格替换未包含在其他换行符中的所有换行符：

re.sub(r"(?<!\n)\n(?!\n)", " ", s)

请参阅Python demo：

import re
s = "This is my first poem\nthat does not make sense\nhow far should it go\nnobody can know.\n\nHere is a seconds\nthat is not as long\ngoodbye\n\n"
res = re.sub(r"(?<!\n)\n(?!\n)", " ", s)
print(res)

此处，(?<!\n)是一个负面后瞻，如果换行符与另一个换行符后退，而(?!\n)是否定前瞻，那么匹配将失败em> fils与换行符的匹配后跟另一个换行符。

详细了解Lookahead and Lookbehind Zero-Length Assertions here。

替换单个换行符，保留倍数

1 个答案: