结果：

Question

如何在每次出现时将文本文件中的多个空白行减少为一行？

我已将整个文件读成字符串，因为我想在行结尾处进行一些替换。

with open(sourceFileName, 'rt') as sourceFile:
    sourceFileContents = sourceFile.read()

这似乎不起作用

while '\n\n\n' in sourceFileContents:
    sourceFileContents = sourceFileContents.replace('\n\n\n', '\n\n')

也不是这个

sourceFileContents = re.sub('\n\n\n+', '\n\n', sourceFileContents)

将它们全部剥离很容易，但我想在每次遇到它们时将多个空白行减少到一个空白行。

我觉得我很亲密，但却无法让它发挥作用。

Answer 1

这是一个范围，但也许某些行不是完全空白的（即它们只有空白字符，这些字符会出现空白）。您可以尝试删除换行符之间的所有可能空格。

re.sub(r'(\n\s*)+\n+', '\n\n', sourceFileContents)

编辑：意识到第二个'+'是多余的，因为\ s *将捕获第一个和最后一个之间的换行符。我们只是想确保最后一个字符肯定是换行符，因此我们不会从包含其他内容的行中删除前导空格。

re.sub(r'(\n\s*)+\n', '\n\n', sourceFileContents)

修改2

re.sub(r'\n\s*\n', '\n\n', sourceFileContents)

应该是一个更简单的解决方案。我们真的只想抓住我们的两个锚定换行符之间的任何可能空间（包括中间换行符），这些换行符将构成单个空行并将其折叠为仅两个换行符。

Answer 2

您的代码适合我。也许有可能会有回车\r。

re.sub(r'[\r\n][\r\n]{2,}', '\n\n', sourceFileContents)

Answer 3

您可以仅使用str方法进行拆分和合并：

text = "some text\n\n\n\nanother line\n\n"
print("\n".join(item for item in text.split('\n') if item))

Answer 4

我想另一种选择更长，但可能更漂亮？

with open(sourceFileName, 'rt') as sourceFile:
    last_line = None
    lines = []
    for line in sourceFile:
         # if you want to skip lines with only whitespace, you could add something like:
            # line = line.lstrip(" \t")
        if last_line != "\n":
            lines.append(line)
        last_line = line
 contents = "".join(lines)

我试图找到一些聪明的生成器函数来写这个，但这是一个漫长的一周所以我不能。

代码未经测试，但我认为它应该有用吗？

（编辑：一个好处是我删除了正常表达式的需要，修复了“现在你有两个问题”的问题:)）

（根据Marc Chiesa关于挥之不去的空白的建议的另一个编辑）

Answer 5

如果您使用以下内容替换您的阅读声明，那么您不必担心空白或回车：

with open(sourceFileName, 'rt') as sourceFile:
    sourceFileContents = ''.join([l.rstrip() + '\n' for l in sourceFile])

执行此操作后，您在OP中尝试的两种方法都可以正常工作。

或

只需在一个简单的循环中写出来。

with open(sourceFileName, 'rt') as sourceFile: lines = [''] for line in (l.rstrip() for l in sourceFile): if line != '' or lines[-1] != '\n': lines.append(line + '\n') sourceFileContents = "".join(lines)

Answer 6

如果您确定自己的行已完全为空，则可以使用positive lookahead将这些行替换为一行，如下所示：

sourceFileContents = re.sub(r'\n+(?=\n)', '\n', sourceFileContents)

regex positive lookahead

Answer 7

对于无法像我这样进行正则表达式的某人，您要处理的内容是python：

for x in range(100):
    content.replace("  ", " ")   # reduce the number of multiple whitespaces

# then
for x in range(20):
    content.replace("\n\n", "\n")   # reduce the number of multiple white lines

lmao

另一种糟糕的解决方案，以防万一您的内容不是python：

MAMP

请注意，如果您有100个以上的连续空格或20个连续的新行，则需要增加重复次数。

享受

Answer 8

使用re模块的非常简单的方法

import re

text = 'Abc\n\n\ndef\nGhijk\n\nLmnop'
text = re.sub('[\n]+', '\n', text) # Replacing one or more consecutive newlines with single \n

结果：

' Abc \ n def \ n Ghijk \ n Lmnop '

将多个空行减少为单个（Python）

8 个答案:

结果：